GIS From The Woods: April 2016

Sunday, April 17, 2016

Domains: Export Geodatabase Domains

Do you work on the IS side of GIS, in ESRI geodatabases? The day may come when you are faced with culling domain values from what seems like a gazillion tables, organizing and verifying the values, and loading them into multiple domains.

Sorting through domain values by visually inspecting each column and loading domains one by one can get tiresome. It may be that you decide that the best thing to do is to learn how to script and automate much of the domain value sorting, validating and loading process.

Whether you are just starting out and want some tips, or whether you don't like scripting and just want something that works--this blog is for you.

This series of posts on domains will put you on your way. Each post describes a python script tool that can be added to an ArcGIS custom toolbox. Collectively, the tools automate the domain workflow, reduce errors, and save time.

Summary Of Tool

The Export Geodatabase Domains tool exports all ESRI geodatabase domains and associated values to a table.

For column heading prefixes, choose from: "For ETL Source", "For ETL Target" or "Other. This will result in, for example, SourceDomain, TargetDomain, or just Domain. Default is "Other".

Output table columns are:

Domain (domain name)
DomainType ("CodedValue" or "Range")
Code (CodedValue domains only--Code value)
Description (CodedValue domains only--Code description)
RangeMin (Range domains only--Minimum value in the range)
RangeMax (Range domains only--Maximum value in the range)
DomainColType (Domain column type)
join_field (Code and Description in lower case, concatenated with an underscore)
Code_Description (Code and Description, concatenated with an underscore)

Columns are populated as appropriate for each domain type. The last two columns are designed for quality assurance.

Use the output from this tool for verifying and editing domain values, for domain documentation, as well as for a domain loading table (with the Batch Domain Loading tool in this toolbox).

Illustration

Usage

There is no usage for this tool.

Syntax

GdbDomainsToTable_mytools (Input_Workspace, Column_Heading_Prefix_To_Use, Output_Workspace, Output_Table_Name)

Parameter	Explanation	Data Type
Input_Workspace	Dialog Reference Geodatabase whose domains and values you want to list ___________________ Python Reference Geodatabase whose domains and values you want to list	Workspace
Column_Heading_Prefix_To_Use	Dialog Reference Choice "For ETL Source", "For ETL Target" or "Other" - Default Other. ___________________ Python Reference Choice "For ETL Source", "For ETL Target" or "Other" - Default Other.	String
Output_Workspace	Dialog Reference Output workspace ___________________ Python Reference Output workspace	Workspace
Output_Table_Name	Dialog Reference Output table name ___________________ Python Reference Output table name	String

Code Samples

ExportGeodatabaseDomains.py

See below for tool source code.

import arcpy, traceback, sys, os
arcpy.env.overwriteOutput = True

def insert_rows(in_layer, field_list, rows_to_insert):
    # USES DA.INSERTCURSOR TO INSERT ROWS OF DATA INTO INPUT TABLE.
    # PARAMETERS: INPUT LAYER, LIST OF FIELD NAME STRINGS, LIST OF DATA ROW LISTS.
    cursor = arcpy.da.InsertCursor(in_layer, field_list)
    for row in rows_to_insert:
        cursor.insertRow(row)
    del cursor
    
def unicode2ascii(in_string):
    '''CONVERTS A STRING WITH UNICODE CHARACTERS TO ASCII (TO USE AS STRING).
    REMOVES LEFT AND RIGHT SLANTED SINGLE AND DOUBLE QUOTES AND M-DASHES'''
    s = in_string
    s = unicode(val).replace(u'\u2018', u"'").replace(u'\u2019', u"'").replace(u'\u2013', u'-').replace(u'\u201c', u'"').replace(u'\u201d', u'"')
    s.encode('ascii')
    return s

    
if __name__ == "__main__":

    in_workspace = arcpy.GetParameterAsText(0) # WORKSPACE
    prefix_chosen = arcpy.GetParameterAsText(1) # STRING, FILTER "For ETL Source",
                                                # "For ETL Target", "Other"
    out_workspace = arcpy.GetParameterAsText(2) # WORKSPACE
    out_table_name = arcpy.GetParameterAsText(3) # STRING
    
    try:

        arcpy.AddMessage(" ")

        # CREATE FIELD NAMES
        if prefix_chosen == "For ETL Source":
            n_prefix = "Source"
            s_prefix = "SrcDom"
        elif prefix_chosen == "For ETL Target":
            n_prefix = "Target"
            s_prefix = "TrgDom"
            n_suffix = ""
        else:
            n_prefix = ""
            s_prefix = ""

        col_name = n_prefix + "Domain"
        col_type = n_prefix + "DomainType"
        col_code = n_prefix + "Code"
        col_desc = n_prefix + "Description"
        col_min = s_prefix + "RangeMin"
        col_max = s_prefix + "RangeMax"
        col_ctype = s_prefix + "DomainColType"
        col_join = "join_field"
        col_code_desc = "Code_Description"
        
        arcpy.AddMessage("     Getting domain names from workspace...")
        domains = arcpy.da.ListDomains(in_workspace)
        
        arcpy.AddMessage("     Creating table...")
        out_table = arcpy.CreateTable_management(out_workspace, out_table_name)
        field_names = [col_name, col_type, col_code, col_desc, col_min, col_max, col_ctype, col_join, col_code_desc]
        for fn in field_names:
            arcpy.AddField_management(out_table, str(fn), "text", field_length=255)

        
        arcpy.AddMessage("     Getting domain values from workspace...")
        # CREATE LIST TO HOLD ROWS OF DATA TO INSERT INTO OUTPUT TABLE
        rows_to_insert = []
        # ITERATE DOMAINS
        for domain in domains:
            domain_name = domain.name
            if domain.domainType == 'CodedValue':
                t = []
                # ITERATE DOMAIN VALUES
                for val, desc in domain.codedValues.items(): 
                    # CONVERT UNICODE CHARACTERS TO ASCII TO REMOVE INVALID CHARS
                    asc_val = unicode2ascii(val)
                    asc_desc = unicode2ascii(desc)
                    # CREATE INSERT-ROW LIST AND APPEND TO ROWS-TO-INSERT LIST
                    t = [str(domain_name), str(domain.domainType), str(asc_val), str(asc_desc), None, None, str(domain.type), 
                         str(domain_name.lower()) + "_" + str(asc_val).lower(), str(domain_name) + "_" + str(asc_val)]
                    rows_to_insert.append(t)
            elif domain.domainType == 'Range':
                t = []
                # CREATE INSERT-ROW LIST FOR RANGE DOMAIN AND APPEND TO ROWS-TO-INSERT LIST
                if domain.type in ("Long", "Short"):
                    # MAKE SURE INTEGERS COME OUT AS INTEGERS
                    t = [str(domain_name), str(domain.domainType), None, None, str(int(domain.range[0])), str(int(domain.range[1])), 
                         str(domain.type), str(domain_name.lower()) + "_" + str(int(domain.range[0])) + "_" + str(int(domain.range[1])), 
                         str(domain_name) + "_" + str(int(domain.range[0])) + "_" + str(int(domain.range[1]))]
                else:
                    t = [str(domain_name), str(domain.domainType), None, None, str(domain.range[0]), str(domain.range[1]), str(domain.type), 
                         str(domain_name.lower()) + "_" + str(domain.range[0]) + "_" + str(domain.range[1]), 
                         str(domain_name) + "_" + str(domain.range[0]) + "_" + str(domain.range[1])]
                rows_to_insert.append(t)
 
        # POPULATE OUTPUT TABLE
        arcpy.AddMessage("     Creating output table...")
        insert_rows(out_table, field_names, rows_to_insert)
        
        # SET OUTPUT PARAMETER
        arcpy.SetParameter(4, out_table) # TABLEVIEW
                
        arcpy.AddMessage(" ")
        
    except:

        # PRINT ERROR MESSAGES
        tb = sys.exc_info()[2]
        tbinfo = traceback.format_tb(tb)[0]
        pymsg = tbinfo + "\n" + str(sys.exc_type)+ ": " + str(sys.exc_value)
        arcpy.AddError("Python Messages: " + pymsg + " GP Messages: " + arcpy.GetMessages(2))
        
    finally:
        del in_workspace, prefix_chosen, out_workspace, out_table_name
        del domains, rows_to_insert, out_table
        del col_name, col_type, col_code, col_desc, col_min, col_max

Tags

geodatabase, domain, Describe, CreateTable, ListDomains, InsertCursor, AddField

Credits

Ruth Bowers, 2015, 2016.

Use limitations

ESRI geodatabases only. Tested as ArcGIS 10.2.2 python script tool in custom toolbox.

Domains: List Distinct Values

Whether you are just starting out and want some tips, or whether you don't like scripting and just want something that works--this blog is for you.

Summary Of Tool

The List Distinct Values tool outputs a list of the unique values contained in each input column to a table. The output table columns are as follows:

Domain (Domain name--left blank for this tool)
DomainType (Type of domain, coded, range, etc.--left blank for this tool)
Code (Domain value)
Description (Domain description)
RangeMin (Range domains only--minimum value--left blank for this tool)
RangeMax (Range domains only--maximum value--left blank for this tool)
DomainColType (Domain column type, Double, Long, Text, etc.--based on field type)
TableName (Table name)
ColumnName (Column name)

The output table schema matches the table schema used in the Batch Domain Loading tool, as well as the Export Geodatabase Domains tool.

Database types supported are: ESRI personal, file, and SDE databases, Oracle and SQL Server remote/local databases, shapefiles, stand-alone tables supported by ArcGIS.

The output from this tool can be used for verifying and editing domain values, for domain documentation, as well as for a domain loading table. It can also be used for non-domain value clean-up.

Illustration

Usage

There is no usage for this tool.

Syntax

ListDistinctValues_mytools (Input_Table, Input_Columns, Output_Workspace, Output_Table_Name)

Parameter	Explanation	Data Type
Input_Table	Dialog Reference Input table ___________________ Python Reference Input table	Table View
Input_Columns	Dialog Reference Input columns ___________________ Python Reference Input columns	Multiple Value
Output_Workspace	Dialog Reference Output workspace ___________________ Python Reference Output workspace	Workspace
Output_Table_Name	Dialog Reference Output table name ___________________ Python Reference Output table name	String

Code Samples

ListDistinctValues.py

See below for tool source code.

import arcpy, traceback, sys, os
arcpy.env.overwriteOutput = True

def add_fields(Input_Layer, field_name_list):
    '''ADDS FIELDS TO A TABLE. USES A VALUE TABLE TO COLLECT FIELD PROPERTIES. 
    The example below shows one way to manage field names for easier coding.
    Create the field name list in your main code, then paste below for reference.
        # NAMES FOR FIELDS TO ADD
        col_name = "Domain"
        col_type = "DomainType"
        col_code = "Code"
        col_desc = "Description"
        col_min = "RangeMin"
        col_max = "RangeMax"
        col_ctype = "DomainColType"
        col_tab = "TableName"
        col_col = "ColumnName"
    Create table syntax:  CreateTable_management (out_path, out_name, {template}, {config_keyword})
    '''
    # CREATE VALUE TABLE FOR FIELD PROPERTIES
    # ADDROW FORMAT: Name Type Length - Length is optional
    vtab = arcpy.ValueTable(3)
    vtab.addRow(field_name_list[0] + " text 255")
    vtab.addRow(field_name_list[1] + " text 15") 
    vtab.addRow(field_name_list[2] + " text 100") 
    vtab.addRow(field_name_list[3] + " text 100") 
    vtab.addRow(field_name_list[4] + " Double") 
    vtab.addRow(field_name_list[5] + " Double") 
    vtab.addRow(field_name_list[6] + " text 15") 
    vtab.addRow(field_name_list[7] + " text 255") 
    vtab.addRow(field_name_list[8] + " text 255") 
    
    x = 0
    # ITERATE THE VALUE TABLE, ADDING FIELDS TO THE INPUT TABLE
    while x < vtab.rowCount:
        if (vtab.getValue(x, 1) == "text"):
            arcpy.AddField_management(Input_Layer, vtab.getValue(x, 0), vtab.getValue(x, 1), field_length=(int(vtab.getValue(x, 2))))
        else:
            arcpy.AddField_management(Input_Layer, vtab.getValue(x, 0), vtab.getValue(x, 1))
        arcpy.AddMessage("          Added field " + vtab.getValue(x, 0) + ".")
        x += 1

def fieldname_list_from_layer(in_layer):
    # RETURNS A PYTHON LIST OF FIELD NAME STRINGS FROM AN INPUT FEATURE LAYER
    desc = arcpy.Describe(in_layer)
    l_path = desc.catalogPath
    flds = arcpy.ListFields(l_path, "*")
    return flds
    
def insert_rows(in_layer, field_list, rows_to_insert):
    # USES DA.INSERTCURSOR TO INSERT ROWS OF DATA INTO INPUT TABLE.
    # PARAMETERS: INPUT LAYER, LIST OF FIELD NAME STRINGS, LIST OF DATA ROW LISTS.
    cursor = arcpy.da.InsertCursor(in_layer, field_list)
    for row in rows_to_insert:
        cursor.insertRow(row)
    del cursor
    
def workspace_from_fc(in_feature_class):
    # RETURNS WORKSPACE NAME FROM FEATURE CLASS PATH OR TABLEVIEW LAYER
    wksp = os.path.dirname(in_feature_class)
    desc = arcpy.Describe(wksp)
    if hasattr(desc, "datasetType") and desc.datasetType=='FeatureDataset':
        wksp = os.path.dirname(wksp)    
    return wksp

def get_basename_from_layername(in_layer):
    # RETURNS BASE NAME FROM LAYER NAME.
    # RETURNS TABLE NAME OR FEATURE CLASS NAME.
    desc_1 = arcpy.Describe(in_layer)
    cat_path = desc_1.catalogPath
    desc_2 = arcpy.Describe(cat_path)
    t = desc_1.baseName
    return t

def get_distinct_values(in_layer, in_col):
    # RETURNS A LIST OF THE UNIQUE VALUES CONTAINED IN THE INPUT COLUMN.
    l = [row[0] for row in arcpy.da.SearchCursor(in_layer, in_col)]
    list_of_distinct_values = set(l)
    return list_of_distinct_values
    
def get_field_type(in_layer, in_field):
    # RETURNS ARCGIS STANDARD FIELD TYPE FOR INPUT FIELD.
    # CONVERTS LISTFIELDS FIELD TYPE TO ARCGIS STANDARD FIELD TYPE.
    desc = arcpy.Describe(in_layer)
    l_path = desc.catalogPath
    flds = arcpy.ListFields(l_path, "*")
    ftype = ""
    for f in flds:
        if f.name.lower()==in_field.lower():
            ftype = f.type
    if ftype == "String":
        ftype = "Text"
    elif ftype == "Integer":
        ftype = "Long Integer"
    elif ftype == "Single":
        ftype = "Float"
    elif ftype == "SmallInteger":
        ftype = "Short Integer"
    else:
        pass
    return ftype

if __name__ == "__main__":

    in_table = arcpy.GetParameterAsText(0) # TABLEVIEW
    in_columns = arcpy.GetParameterAsText(1).split(";") # FIELD, MULTIPLE
    out_workspace = arcpy.GetParameterAsText(2) # WORKSPACE
    out_table_name = arcpy.GetParameterAsText(3) # STRING

    try:

        arcpy.AddMessage(" ")
        
        # CREATE INTERMEDIATE OUTPUT TABLE
        arcpy.AddMessage("     Creating output table...")
        arcpy.Delete_management("in_memory")
        mem_table = arcpy.CreateTable_management ("in_memory", out_table_name)
    
        # CREATE THE COLUMN HEADINGS LIST
        col_name = "Domain"
        col_type = "DomainType"
        col_code = "CODE"
        col_desc = "DESCRIPTION"
        col_min = "RangeMin"
        col_max = "RangeMax"
        col_ctype = "DomainColType"
        col_tab = "TableName"
        col_col = "ColumnName"
        field_names = [col_name, col_type, col_code, col_desc, col_min, col_max, col_ctype, col_tab, col_col]

        # ADD THE COLUMN HEADINGS TO THE OUTPUT TABLE
        add_fields(mem_table, field_names)
        
        # CREATE LIST TO HOLD ROWS OF DATA TO INSERT INTO OUTPUT TABLE
        rows_to_insert = []

        this_line = []
        val_list = []
        tabname = get_basename_from_layername(in_table)
        arcpy.AddMessage("     Collecting values from table " + in_table + "...")
        # ITERATE INPUT FIELDS.
        for fld in in_columns:
            # GET FIELD TYPE AND LIST OF UNIQUE VALUES FOR THIS COLUMN.
            sDomain = ""
            sType = get_field_type(in_table, fld)
            sDType = ""
            val_list = get_distinct_values(in_table, fld)
            # ITERATE VALUES, CREATING INSERT ROW LIST FOR THIS FIELD
            # AND APPEND IT TO THE LIST OF ROWS TO INSERT
            for v in val_list:
                this_line = [] # START LIST OF VALUES FOR THIS ROW.
                this_line.append(None) # SKIP DOMAIN NAME SINCE IT IS UNKNOWN.
                this_line.append(None) # SKIP DOMAIN TYPE SINCE IT IS UNKNOWN.
                if v is None: # FOR NULL, CODE="NULL" AND DESCRIPTION IS BLANK.
                    this_line.append("NULL")
                    this_line.append(None)
                else: # VALUE NOT NULL, POPULATE CODE AND DESCRIPTION WITH VALUE.
                    this_line.append(str(v))
                    this_line.append(str(v))
                this_line.append(None) # SKIP RANGE MIN SINCE IT IS UNKNOWN.
                this_line.append(None) # SKIP RANGE MAX SINCE IT IS UNKNOWN.
                this_line.append(sType) # FIELD TYPE
                this_line.append(tabname.upper()) # TABLE NAME IN UPPER CASE
                this_line.append(fld.upper()) # FIELD NAME IN UPPER CASE
                rows_to_insert.append(this_line)
            
        # POPULATE OUTPUT TABLE AND SAVE TO DISK
        arcpy.AddMessage("     Creating output table...")
        insert_rows(mem_table, field_names, rows_to_insert)
        out_table = arcpy.TableToTable_conversion(mem_table, out_workspace, out_table_name)

        # OUTPUT PARAMETER
        arcpy.SetParameter(4, out_table) # TABLEVIEW
                
        arcpy.AddMessage(" ")
               
    except:

        # PRINT ERROR MESSAGES
        tb = sys.exc_info()[2]
        tbinfo = traceback.format_tb(tb)[0]
        pymsg = tbinfo + "\n" + str(sys.exc_type)+ ": " + str(sys.exc_value)
        arcpy.AddError("Python Messages: " + pymsg + " GP Messages: " + arcpy.GetMessages(2))
        
    finally:
        del in_table, out_workspace, out_table_name
        del out_table
        del rows_to_insert, this_line
        del val_list
        arcpy.Delete_management("in_memory")

Tags

Domain, AddField, Describe, InsertCursor, Set, SearchCursor, In-Memory, CreateTable, TableToTable

Credits

Ruth Bowers, 2016

Use limitations

Tested in ArcGIS 10.2.2 as python script tool in custom toolbox. Not tested on Excel and text files.

Domains: List Domain Values

Whether you are just starting out and want some tips, or whether you don't like scripting and just want something that works--this blog is for you.

Summary Of Tool

The List Domain Values tool outputs a table listing the unique values in each domain columnn in each of the input tables. The output table columns are as follows:

Domain (Domain name)
DomainType (Type of domain, coded, range, etc.)
Code (Domain value)
Description (Domain description)
RangeMin (Range domains only--minimum value--left blank for this tool)
RangeMax (Range domains only--maximum value--left blank for this tool)
DomainColType (Domain column type, Double, Long, Text, etc.)
TableName (Table name)
ColumnName (Column name)

The table schema matches the table schema used in the Batch Domain Loading tool, as well as the Export Geodatabase Domains tool.

ESRI database types supported are: personal, file, SDE. Non-ESRI databases are not supported.

The output from this tool can be used for verifying and editing domain values, for domain documentation, as well as for a domain loading table.

Illustration

Usage

There is no usage for this tool.

Syntax

DomainValuesInTables_mytools (Feature_Class_or_Table_List_To_Process, Output_Workspace, Output_Table_Name)

Parameter	Explanation	Data Type
Feature_Class_or_Table_List_To_Process	Dialog Reference List of Tables or Feature Classes ___________________ Python Reference List of Tables or Feature Classes	Multiple Value
Output_Workspace	Dialog Reference Output workspace ___________________ Python Reference Output workspace	Workspace
Output_Table_Name	Dialog Reference Output table name ___________________ Python Reference Output table name	String

Code Samples

ListDomainValuesInTables.py

See below for tool source code.

import arcpy, traceback, sys, os
arcpy.env.overwriteOutput = True

def add_fields(Input_Layer, field_name_list):
    '''ADDS FIELDS TO A TABLE. USES A VALUE TABLE TO COLLECT FIELD PROPERTIES. 
    The example below shows one way to manage field names for easier coding.
    Create the field name list in your main code, then paste below for reference.
        # NAMES FOR FIELDS TO ADD
        col_name = "Domain"
        col_type = "DomainType"
        col_code = "Code"
        col_desc = "Description"
        col_min = "RangeMin"
        col_max = "RangeMax"
        col_ctype = "DomainColType"
        col_tab = "TableName"
        col_col = "ColumnName"
    Create table syntax:  CreateTable_management (out_path, out_name, {template}, {config_keyword})
    '''
    # CREATE VALUE TABLE FOR FIELD PROPERTIES
    # ADDROW FORMAT: Name Type Length - Length is optional
    vtab = arcpy.ValueTable(3)
    vtab.addRow(field_name_list[0] + " Text 100") 
    vtab.addRow(field_name_list[1] + " Text 15") 
    vtab.addRow(field_name_list[2] + " Text 100") 
    vtab.addRow(field_name_list[3] + " Text 100") 
    vtab.addRow(field_name_list[4] + " Double") 
    vtab.addRow(field_name_list[5] + " Double") 
    vtab.addRow(field_name_list[6] + " Text 15") 
    vtab.addRow(field_name_list[7] + " Text 50") 
    vtab.addRow(field_name_list[8] + " Text 50") 
    
    x = 0
    # ITERATE THE VALUE TABLE, ADDING FIELDS TO THE INPUT TABLE
    while x < vtab.rowCount:
        if (vtab.getValue(x, 1) == "Text"):
            arcpy.AddField_management(Input_Layer, vtab.getValue(x, 0), vtab.getValue(x, 1), field_length=(int(vtab.getValue(x, 2))))
        else:
            arcpy.AddField_management(Input_Layer, vtab.getValue(x, 0), vtab.getValue(x, 1))
        arcpy.AddMessage("          Added field " + vtab.getValue(x, 0) + ".")
        x += 1

def fieldname_list_from_layer(in_layer):
    # RETURNS A PYTHON LIST OF FIELD NAME STRINGS FROM AN INPUT FEATURE LAYER
    desc = arcpy.Describe(in_layer)
    l_path = desc.catalogPath
    flds = arcpy.ListFields(l_path, "*")
    return flds
    
def insert_rows(in_layer, field_list, rows_to_insert):
    # USES DA.INSERTCURSOR TO INSERT ROWS OF DATA INTO INPUT TABLE.
    # PARAMETERS: INPUT LAYER, LIST OF FIELD NAME STRINGS, LIST OF DATA ROW LISTS.
    cursor = arcpy.da.InsertCursor(in_layer, field_list)
    for row in rows_to_insert:
        cursor.insertRow(row)
    del cursor
    
def workspace_from_fc(in_feature_class):
    # RETURNS WORKSPACE NAME FROM FEATURE CLASS PATH OR TABLEVIEW LAYER
    wksp = os.path.dirname(in_feature_class)
    desc = arcpy.Describe(wksp)
    if hasattr(desc, "datasetType") and desc.datasetType=='FeatureDataset':
        wksp = os.path.dirname(wksp)    
    return wksp

def get_basename_from_layername(in_layer):
    # RETURNS BASE NAME FROM LAYER NAME.
    # RETURNS TABLE NAME OR FEATURE CLASS NAME.
    desc_1 = arcpy.Describe(in_layer)
    cat_path = desc_1.catalogPath
    desc_2 = arcpy.Describe(cat_path)
    t = desc_1.baseName
    return t

def get_distinct_values(in_layer, in_col):
    # RETURNS A LIST OF THE UNIQUE VALUES CONTAINED IN THE INPUT COLUMN.
    l = [row[0] for row in arcpy.da.SearchCursor(in_layer, in_col)]
    list_of_distinct_values = set(l)
    return list_of_distinct_values
    
def dict_domainname_domaincolumntype(input_wksp):
    # CREATES A DICTIONARY FROM CODED VALUE DOMAINS.
    # KEY = DOMAIN NAME, VALUES = DOMAIN COLUMN TYPE AND DOMAIN TYPE.
    # RETURNS NONE IF WORKSPACE TYPE IS FILE SYSTEM.
    desc = arcpy.Describe(input_wksp)
    wksp_type = desc.workspaceType
    if wksp_type == 'FileSystem':
        d = None
    else:
        d = {}
        doms = arcpy.da.ListDomains(input_wksp)
        for dom in doms:
            d[dom.name] = [dom.type, dom.domainType]
    return d

if __name__ == "__main__":

    in_tables = arcpy.GetParameterAsText(0).split(";") # TABLEVIEW, MULTIPLE
    out_workspace = arcpy.GetParameterAsText(1) # WORKSPACE
    out_table_name = arcpy.GetParameterAsText(2) # STRING

    try:

        arcpy.AddMessage(" ")
        
        # CREATE OUTPUT TABLE
        arcpy.AddMessage("     Creating output table...")
        arcpy.Delete_management("in_memory")
        mem_table = arcpy.CreateTable_management ("in_memory", out_table_name)
    
        # CREATE THE COLUMN HEADINGS LIST
        col_name = "Domain"
        col_type = "DomainType"
        col_code = "Code"
        col_desc = "Description"
        col_min = "RangeMin"
        col_max = "RangeMax"
        col_ctype = "DomainColType"
        col_tab = "TableName"
        col_col = "ColumnName"
        field_names = [col_name, col_type, col_code, col_desc, col_min, col_max, col_ctype, col_tab, col_col]

        # ADD THE COLUMN HEADINGS TO THE OUTPUT TABLE
        add_fields(mem_table, field_names)
        
        # GET DOMAINS FROM DATABASE
        in_workspace = workspace_from_fc(in_tables[0])
        arcpy.AddMessage("     Input workspace: " + str(in_workspace))
        arcpy.AddMessage("     Getting domain names from workspace...")
        dom_dict = dict_domainname_domaincolumntype(in_workspace)
        
        # CREATE LIST TO HOLD ROWS OF DATA TO INSERT INTO OUTPUT TABLE
        rows_to_insert = []

        # ITERATE TABLES
        fld_list = []
        this_line = []
        val_list = []
        for tab in in_tables:
            tabname = get_basename_from_layername(tab)
            arcpy.AddMessage("     Collecting domain values from table " + tabname + "...")
            # GET FIELDS
            fld_list = fieldname_list_from_layer(tab)
            # ITERATE FIELDS
            for fld in fld_list:
                # IF FIELD IS DOMAIN GET UNIQUE VALUES
                if fld.domain != "":
                    sDomain = fld.domain
                    sColumn = fld.name
                    sType = ""
                    sDType = ""
                    if sDomain in dom_dict:
                        sType = dom_dict[sDomain][0]
                        sDType = dom_dict[sDomain][1]
                    val_list = get_distinct_values(tab, sColumn)
                    # CREATE INSERT ROW LIST FOR THIS FIELD
                    # AND APPEND IT TO THE LIST OF ROWS TO INSERT
                    for v in val_list:
                        this_line = []
                        this_line.append(sDomain)
                        this_line.append(str(sDType))
                        this_line.append(str(v))
                        this_line.append(str(v))
                        this_line.append(None)
                        this_line.append(None)
                        this_line.append(str(sType))
                        this_line.append(tabname)
                        this_line.append(sColumn)
                        rows_to_insert.append(this_line)
            
        # POPULATE THE OUTPUT TABLE AND SAVE TO DISK
        arcpy.AddMessage("     Creating output table...")
        insert_rows(mem_table, field_names, rows_to_insert)
        out_table = arcpy.TableToTable_conversion(mem_table, out_workspace, out_table_name)

        # SET THE OUTPUT PARAMETER
        arcpy.SetParameter(3, out_table) # TABLEVIEW
                
        arcpy.AddMessage(" ")
               
    except:

        # PRINT ERROR MESSAGES
        tb = sys.exc_info()[2]
        tbinfo = traceback.format_tb(tb)[0]
        pymsg = tbinfo + "\n" + str(sys.exc_type)+ ": " + str(sys.exc_value)
        arcpy.AddError("Python Messages: " + pymsg + " GP Messages: " + arcpy.GetMessages(2))
        
    finally:
        del in_tables, out_workspace, out_table_name
        del out_table
        del in_workspace, dom_dict, rows_to_insert, fld_list, this_line
        del val_list
        arcpy.Delete_management("in_memory")

Tags

Domain, AddField, Describe, InsertCursor, Set, SearchCursor, Dictionary, CreateTable, TableToTable

Credits

Ruth Bowers, 2016

Use limitations

Tested in ArcGIS 10.2.2. as a python script tool in a custom toolbox.

Domains: Batch Domain Loading

Whether you are just starting out and want some tips, or whether you don't like scripting and just want something that works--this blog is for you.

Summary Of Tool

The Batch Domain Loading tool loads values to multiple geodatabse coded value domains from a single input table. Each domain code value must be in a separate row in the loading table.

The following columns must exist in the domain loading table:

Domain name
Code (value)
Description
Domain column type, i.e., Long, Double, Text, etc.

Domain values are only loaded if the domain already exists (guards against misspellings).

With the APPEND option, if an existing domain value matches a value to be loaded, the existing value is deleted and replaced.

Domain values are sorted by code within each domain after being loaded.

This tool supports all levels of geodatabases.

Illustration

Usage

There is no usage for this tool.

Syntax

BatchTableToDomain_mytools (Domain_Value_Table, Code_Column, Description_Column, Domain_Column_Type_Column, Domain_Name_Column, Input_Workspace, Update_Option)

Parameter	Explanation	Data Type
Domain_Value_Table	Dialog Reference Domain loading table ___________________ Python Reference Doamin loading table	Table View
Code_Column	Dialog Reference Coded domain value column, obtained from Domain_Values_Table ___________________ Python Reference Coded domain value column, obtained from Domain_Values_Table	Field
Description_Column	Dialog Reference Coded domain description column, obtained from Domain_Values_Table ___________________ Python Reference Coded domain description column, obtained from Domain_Values_Table	Field
Domain_Column_Type_Column	Dialog Reference Domain column type field, obtained from Domain_Values_Table ___________________ Python Reference Domain column type field, obtained from Domain_Values_Table	Field
Domain_Name_Column	Dialog Reference Domain name column, obtained from Domain_Values_Table ___________________ Python Reference Domain name column, obtained from Domain_Values_Table	Field
Input_Workspace	Dialog Reference Target workspace for domain values ___________________ Python Reference Target workspace for domain values	Workspace
Update_Option	Dialog Reference APPEND or REPLACE target domain values ___________________ Python Reference APPEND or REPLACE target domain values	String

Code Samples

BatchDomainLoading.py

See below for tool source code.

import arcpy, traceback, sys
arcpy.env.overwriteOutput = True

def get_distinct_col_val(in_layer, in_col):
    # RETURNS A LIST OF THE UNIQUE VALUES CONTAINED IN THE INPUT COLUMN.
    l = [row[0] for row in arcpy.da.SearchCursor(in_layer, in_col)]
    list_of_distinct_values = set(l)
    return list_of_distinct_values
    
def dict_from_2cols(in_layer, col_1, col_2):
    # CREATES A DICTIONARY FROM TWO COLUMNS; COL 1 IS KEY, COL 2 IS VALUE.
    d = {}
    with arcpy.da.SearchCursor(in_layer, [col_1, col_2]) as c:
        for r in c:
            d[r[0]] = r[1]
    return d

def dict_from_coded_value_domains(input_wksp):
    # CREATES A DICTIONARY FROM CODED VALUE DOMAINS. KEY=DOMAIN NAME; VALUE=DOMAIN VALUES.
    d = {}
    doms = arcpy.da.ListDomains(input_wksp)
    for dom in doms:
        if dom.domainType == 'CodedValue':
            d[dom.name] = dom.codedValues.items()
    return d
    
def get_delimited_field(in_layer, fld):
    # RETURNS A DELIMITED FIELD NAME STRING FOR INPUT FIELD
    d = arcpy.Describe(in_layer)
    dCatPath = d.catalogPath
    fld_delimited = arcpy.AddFieldDelimiters(dCatPath, fld)
    return fld_delimited
    
def get_sel_count(in_layer):
    # RETURNS SELECTION COUNT AS INTEGER
    selCnt = int(arcpy.GetCount_management(in_layer).getOutput(0))
    return selCnt

if __name__ == "__main__":

    domain_value_table = arcpy.GetParameterAsText(0) # TABLEVIEW
    code_field = arcpy.GetParameterAsText(1) # FIELD, OBTAINED FROM DOMAIN_VALUE_TABLE
    desc_field = arcpy.GetParameterAsText(2) # FIELD, OBTAINED FROM DOMAIN_VALUE_TABLE
    type_field = arcpy.GetParameterAsText(3) # FIELD, OBTAINED FROM DOMAIN_VALUE_TABLE
    domain_name_field = arcpy.GetParameterAsText(4) # FIELD, OBTAINED FROM DOMAIN_VALUE_TABLE
    input_wksp = arcpy.GetParameterAsText(5) # WORKSPACE
    update_option = arcpy.GetParameterAsText(6) # STRING # FILTER, APPEND OR REPLACE
    
    target_ddesc = ""

    try:

        arcpy.AddMessage(" ")

        # RETRIEVE LIST OF SOURCE DOMAIN NAMES FROM DICTIONARY
        src_dn_list = get_distinct_col_val(domain_value_table, domain_name_field)
        
        # CREATE DICTIONARY OF TARGET CODED-VALUE DOMAIN NAMES AND VALUES
        targ_dn_dict = dict_from_coded_value_domains(input_wksp)
        targ_dn_list = targ_dn_dict.keys()
        
        # GET DELIMITED SOURCE DOMAIN NAME FIELD
        domain_name_field_delim = get_delimited_field(domain_value_table, domain_name_field)
        
        # ITERATE SOURCE DOMAIN NAMES...    
        for src_dn in src_dn_list:
        
            domain_exists = False

            # SELECT SOURCE DOMAIN VALUES IN LOADING TABLE FOR THIS DOMAIN
            expr = domain_name_field_delim + " = '" + str(src_dn) + "'"
            arcpy.SelectLayerByAttribute_management(domain_value_table, "CLEAR_SELECTION")
            etl_tab_sel = arcpy.SelectLayerByAttribute_management(domain_value_table, "NEW_SELECTION", expr)
            cnt = get_sel_count(domain_value_table)
            
            # ASSUME TARGET DOMAIN NAME IS SAME AS SOURCE
            target_dn = src_dn
            
            # IF ANYTHING IS SELECTED...
            if cnt > 0:
                # IF TARGET DOMAIN EXISTS... (DON'T WANT TO LOAD MISSPELLED DOMAIN)
                domain_exists = target_dn in targ_dn_list
                if domain_exists == True:
                    # OUTPUT SELECTED SOURCE DOMAIN VALUES IN LOADING TABLE TO TEMP TABLE
                    arcpy.Delete_management("in_memory")
                    arcpy.TableToTable_conversion(domain_value_table, "in_memory", "src_dom_val")
                    # ASSUMES DOMAIN COLUMN TYPE IS TEXT TO START WITH
                    dom_col_type = "Text"
                    # SET DOMAIN COLUMN TYPE TO DOUBLE IF APPROPRIATE
                    with arcpy.da.SearchCursor("in_memory/src_dom_val", type_field) as type_cur:
                        for row in type_cur:
                            if row[0] == "Double":
                                dom_col_type = "Double"
                                break
                    # IF DOMAIN COLUMN TYPE IS DOUBLE,
                    # ADD A CODE_DBL COLUMN AND POPULATE IT.
                    if dom_col_type == "Double":
                        arcpy.AddField_management("in_memory/src_dom_val", "code_dbl", field_type="Double", field_alias="code_dbl")
                        arcpy.CalculateField_management("in_memory/src_dom_val", "code_dbl", "!" + code_field + "!", "PYTHON_9.3")
                    # IF UPDATE OPTION IS REPLACE, RUN TABLE TO DOMAIN TOOL
                    if update_option == "REPLACE":
                        arcpy.AddMessage("     Replacing %s domain with domain and values from %s domain..." % (target_dn, src_dn))
                        if dom_col_type == "Double":
                            arcpy.TableToDomain_management("in_memory/src_dom_val", "code_dbl", desc_field, input_wksp, target_dn, target_ddesc, update_option)
                        else:
                            arcpy.TableToDomain_management("in_memory/src_dom_val", code_field, desc_field, input_wksp, target_dn, target_ddesc, update_option)
                    # ELSE IF UPDATE OPTION IS APPEND...
                    else:
                        arcpy.AddMessage("     Appending %s domain with domain and values from %s domain..." % (target_dn, src_dn))
                        # CREATE LIST OF ALL TARGET DOMAIN VALUES
                        val_to_del = []
                        all_target_val = []
                        all_target_items = targ_dn_dict[target_dn]
                        for val, desc in all_target_items:
                            all_target_val.append(val)
                        # CREATE LIST OF TARGET DOMAIN VALUES THAT MATCH SOURCE
                        # DOMAIN VALUES TO BE APPENDED. THESE WILL BE DELETED.
                        with arcpy.da.SearchCursor("in_memory/src_dom_val", code_field) as temp_cur:
                            for row in temp_cur:
                                if row[0] in all_target_val:
                                    val_to_del.append(row[0])
                                    arcpy.AddMessage("          Replacing value " + row[0])
                        # DELETE EXISTING VALUES
                        if len(val_to_del)>0:
                            arcpy.DeleteCodedValueFromDomain_management(input_wksp, target_dn, val_to_del)
                    
                        # RUN TABLE TO DOMAIN TOOL
                        if dom_col_type == "Double":
                            arcpy.TableToDomain_management("in_memory/src_dom_val", "code_dbl", desc_field, input_wksp, target_dn, target_ddesc, update_option)
                        else:
                            arcpy.TableToDomain_management("in_memory/src_dom_val", code_field, desc_field, input_wksp, target_dn, target_ddesc, update_option)
                        
                    # SORT DOMAIN BY CODE
                    arcpy.AddMessage("          Sorting domain...")
                    arcpy.SortCodedValueDomain_management(input_wksp, target_dn, "CODE", "ASCENDING")
                    
                else:
                    arcpy.AddMessage("     Target domain %s not found. Source domain %s values not added." % (src_dn, target_dn))
            # ELSE LET USER KNOW DOMAIN VALUES WERE NOT FOUND
            else:
                arcpy.AddMessage("     No values found for source domain %s, nothing added to target domain %s" % (src_dn, target_dn))
                
        arcpy.SelectLayerByAttribute_management(domain_value_table, "CLEAR_SELECTION")
                
        arcpy.AddMessage(" ")
        
        # SET OUTPUT PARAMETER
        arcpy.SetParameter(7, input_wksp) # WORKSPACE
        
    except:

        # PRINT ERROR MESSAGES
        tb = sys.exc_info()[2]
        tbinfo = traceback.format_tb(tb)[0]
        pymsg = tbinfo + "\n" + str(sys.exc_type)+ ": " + str(sys.exc_value)
        arcpy.AddError("Python Messages: " + pymsg + " GP Messages: " + arcpy.GetMessages(2))
        
    finally:
        arcpy.SelectLayerByAttribute_management(domain_value_table, "CLEAR_SELECTION")
        del domain_value_table, code_field
        del desc_field, domain_name_field, input_wksp, update_option
        arcpy.Delete_management("in_memory")

Tags

Domain, geodatabase, TableToDomain, dictionary, DeleteCodedValueFromDomain, TableToTable, SortCodedValueDomain, AddField, SearchCursor, CalculateField, SelectLayerByAttribute, Describe, ListDomains, GetCount, AddFieldDelimiters

Credits

Ruth Bowers, 2015

Use limitations

Tested as script tool in ArcGIS 10.2.2

Friday, April 15, 2016

Data Discovery: Schema To Table

Summary

This tool uses arcpy.da.Walk and Describe to export the schema of a workspace to a table. Each table and column will output to one row.

This tool is designed for feature class and tabular data. Workspace types supported: personal geodatabase; file geodatabase; enterprise geodatabase; feature datasets; remote databases (non-geodatabase); and folders. Feature class and table types supported: geodatabase and shapefile feature classes; stand-alone geodatabase tables; Oracle tables; SQL Server tables; dBase files; text files; CSV files; and Excel spreadsheets (single-table tabs and named ranges).

This tool's output is very detailed, and the tool may take some time to run. This is especially true on large remote databases (testing ranged from a few minutes to about an hour). For a record count of tables/feature classes without as much detail, use the List Record Counts tool. For a brief list of workspace contents, use the List Workspace Contents tool.

The default fields included are:

workspace (Workspace)
data_type (Data type per arcpy.Describe)
table_name (Table name)
col_name (Column name)
col_type (Column type)
col_width (Column width)
col_prec (Column precision)
col_alias (Column alias)
join_field (table_name and column_name concatenated with an underscore)

The optional columns included with column statistics are:

col_cnt (count of rows populated),
col_pc_pop (percentage of rows populated),
col_has_zeros (contains zeros--includes text columns and numeric columns),
col_len (text columns only--maximum length in column),
col_min (numeric columns only--minimum value in column)
col_max (numeric columns only--maximum value in column)
date_min (date columns only--earliest date in column)
date_max (date columns only--most recent date in column)

The optional columns included with domain data are:

domain_flag (Y/N),
domain_name (Domain name associated with column, if any),
domain_type (Domain type, coded, range, etc.)
domain_column_type (Domain column type, Text, Double, Long, etc).

The user also chooses between three column heading formats: For ETL Source; For ETL Target; and Other. Respectively, these result in column prefixes of "Source_" or "Src_", "Target_" or "Targ_", or no prefixing. For example, again respectively, the table name column heading would be "source_table", "target_table", or "table_name".

Note that Excel record counts and column types will be approximate due to ArcGIS limitations.

The output of this tool can be used to quickly assess and QC a database or group of shapefiles/tables, as well as longer-term QC work. For example, you can quickly select empty or partially empty columns and tables, look at data ranges, as well as have domain names, types, column types, and number ranges in one table. This output can also be used as a starting point in scripting because the workspace and table concatenate to a valid path.

Illustration

Usage

There is no usage for this tool.

Syntax

WorkspaceSchemaToTable_mytools (Input_Workspace, {Include_Domains}, {Include_Counts}, Column_Headings, Output_Workspace, Output_Table_Name)

Parameter	Explanation	Data Type
Input_Workspace	Dialog Reference Input workspace ___________________ Python Reference Input workspace	Workspace
Include_Domains (Optional)	Dialog Reference Check to include a column with the name of the assigned domain, if any. ___________________ Python Reference Check to include a column with the name of the assigned domain, if any.	Boolean
Include_Counts (Optional)	Dialog Reference Check to include columns with various stats for this field. ___________________ Python Reference Check to include columns with various stats for this field.	Boolean
Column_Headings	Dialog Reference Choose between 3 column heading formats: Source, Target, Other. ___________________ Python Reference Choose between 3 column heading formats: Source, Target, Other.	String
Output_Workspace	Dialog Reference Output workspace. ___________________ Python Reference Output workspace.	Workspace
Output_Table_Name	Dialog Reference Output table name ___________________ Python Reference Output table name	String

Code Samples

WorkspaceSchema2Table.py

See below for tool source code.

import arcpy, traceback, sys, os, time
from dateutil.parser import parse
arcpy.env.overwriteOutput = True

def add_fields(Input_Layer, field_name_list):
    '''Quick, hard-coded way to add fields to a table, 
    example below shows one way to manage field names for easier coding.
    Create the field name list in your main code, then paste below for reference.
        # NAMES FOR FIELDS TO ADD
        #                  0       1        2         3        4          5          6         7          8
        tableheaders = [w_name, d_type, tab_name, col_name, col_type, col_width, col_prec, col_alias, col_join] - 9 cols
            doCounts = True
            tableheaders.append(col_count)       #  9  -  8 cols
            tableheaders.append(col_pc)          # 10
            tableheaders.append(contains_zeros)  # 11
            tableheaders.append(max_len)         # 12
            tableheaders.append(min_val)         # 13
            tableheaders.append(max_val)         # 14
            tableheaders.append(min_date_col)    # 15
            tableheaders.append(max_date_col)    # 16
            doDomains = True
            tableheaders.append(dom_flag)       # 17  -  4 cols
            tableheaders.append(col_domain)     # 18
            tableheaders.append(dom_col_type)   # 19
            tableheaders.append(dom_typ_col)    # 20
    Create table syntax:  CreateTable_management (out_path, out_name, {template}, {config_keyword})
    '''
    # CREATE VALUE TABLE FOR FIELDS TO ADD
    # ADDROW FORMAT: Name Type Length - Length is optional
    vtab = arcpy.ValueTable(3)
    vtab.addRow(field_name_list[0] + " text 255") 
    vtab.addRow(field_name_list[1] + " text 15") 
    vtab.addRow(field_name_list[2] + " text 100") 
    vtab.addRow(field_name_list[3] + " text 100") 
    vtab.addRow(field_name_list[4] + " text 15") 
    vtab.addRow(field_name_list[5] + " Double") 
    vtab.addRow(field_name_list[6] + " Double") 
    vtab.addRow(field_name_list[7] + " text 50") 
    vtab.addRow(field_name_list[8] + " text 255")
    
    if len(field_name_list) in (17, 21):
        vtab.addRow(field_name_list[9] + " Double") 
        vtab.addRow(field_name_list[10] + " Double") 
        vtab.addRow(field_name_list[11] + " text 5") 
        vtab.addRow(field_name_list[12] + " Double") 
        vtab.addRow(field_name_list[13] + " Double") 
        vtab.addRow(field_name_list[14] + " Double") 
        vtab.addRow(field_name_list[15] + " date") 
        vtab.addRow(field_name_list[16] + " date") 

    if len(field_name_list) == 21:
        vtab.addRow(field_name_list[17] + " text 5") 
        vtab.addRow(field_name_list[18] + " text 100") 
        vtab.addRow(field_name_list[19] + " text 15") 
        vtab.addRow(field_name_list[20] + " text 15") 

    if len(field_name_list) == 13:
        vtab.addRow(field_name_list[9] + " text 5") 
        vtab.addRow(field_name_list[10] + " text 100") 
        vtab.addRow(field_name_list[11] + " text 15") 
        vtab.addRow(field_name_list[12] + " text 15") 
        
    x = 0
    # LOOP THROUGH THE VALUE TABLE, ADDING FIELDS
    while x < vtab.rowCount:
        if (vtab.getValue(x, 1) == "text"):
            arcpy.AddField_management(Input_Layer, vtab.getValue(x, 0), vtab.getValue(x, 1), field_length=(int(vtab.getValue(x, 2))))
        else:
            arcpy.AddField_management(Input_Layer, vtab.getValue(x, 0), vtab.getValue(x, 1))
        arcpy.AddMessage("     Added field " + vtab.getValue(x, 0) + ".")
        x += 1

def dict_domainname_domaincolumntype(input_wksp):
    # CREATE DICTIONARY FROM CODED VALUE DOMAINS; DOMAIN NAME IS KEY, DOMAIN VALUES ARE LIST VALUE.
    # RETURNS NONE IF WORKSPACE TYPE IS FILE SYSTEM.
    desc = arcpy.Describe(input_wksp)
    wksp_type = desc.workspaceType
    if wksp_type == 'FileSystem':
        d = None
    else:
        d = {}
        doms = arcpy.da.ListDomains(input_wksp)
        for dom in doms:
            d[dom.name] = [dom.type, dom.domainType]
    return d

def insert_rows(in_layer, field_list, rows_to_insert):
    # INSERTS A PYTHON LIST OF LISTS CONTAINING COMMA-DELIMITED ROWS OF DATA TO INSERT.
    # PARAMETERS: LAYER TO INSERT ROWS INTO, LIST OF FIELD NAME STRINGS, DATA ROWS.
    cursor = arcpy.da.InsertCursor(in_layer, field_list)
    for row in rows_to_insert:
        cursor.insertRow(row)
    del cursor
    
def replace_non_alphanum_char_with_other(in_string, replace_string):
    # REPLACES NON-ALPHANUMERIC CHARACTERS WITH A REPLACE STRING.
    # FOR EXAMPLE, YOU COULD REPLACE ALL THE NON-ALPHANUMERIC CHARACTERS WITH A SINGLE UNDERSCORE.
    # ex: change 1001 - Deep Fork Gathering to 1001_Deep_Fork_Gathering
    s = in_string
    for c in ("!", "@", "^", "+", "=", "|", ";", "<", ">", "~", "`", " ", "-", "(", ")", "#", "?", "%", "&", "*", "{", "}", "[", "]", ".", ":", "\\", "/"):
        if c in s:
            s = s.replace(c, replace_string)
    s = s.replace(replace_string + replace_string, replace_string).replace(replace_string + replace_string, replace_string).replace(replace_string + replace_string, replace_string).replace(replace_string + replace_string, replace_string).replace(replace_string + replace_string, replace_string)
    return s

def string_is_date(string):
    # DETERMINES IF A STRING COULD BE A DATE
    # from dateutil.parser import parse
    try: 
        parse(string)
        return True
    except ValueError:
        return False
        
def string_is_number(s):
    # CHECKS TO SEE IF STRING IS NUMERIC. BOOLEAN.
    try:
        float(s)
        return True
    except ValueError:
        return False

def string_is_long_or_double(s):
    ''' RETURNS "LONG" OR "DOUBLE" DEPENDING ON NUMERIC DATA TYPE THAT STRING CONVERTS TO.
    ASSUMES STRING HAS BEEN VERIFIED AS NUMERIC.'''
    if round(float(s), 0)//float(s) != 1:
        return "Double"
    else:
        return "Long"

def get_distinct_values(in_layer, in_col):
    # RETURN A UNIQUE LIST OF VALUES FROM THE SPECIFIED COLUMN
    l = [row[0] for row in arcpy.da.SearchCursor(in_layer, in_col)]
    list_of_distinct_values = set(l)
    return list_of_distinct_values
    
def table_schema_to_file(tab_path, doDomains, doCounts, dom_dict):
    # COLLECTS TABLE SCHEMA AND COMPILES INSERT ROWS
    try:
        # FIRST, DESCRIBE TABLE AND COLLECT FIELDS.
        # ALSO SET INITIAL VARIABLES 
        # AS WELL A LIST OF WHAT NULL MEANS AND WHETHER TABLE IS EXCEL.
        rows_to_add = []
        lfields=arcpy.ListFields(tab_path)
        d = arcpy.Describe(tab_path)
        t = d.baseName
        db = d.path
        is_excel = False
        if str(db)[-4:] in [".xls", "xlsx"]:
            t = t.strip("'")
            is_excel = True
        dtype = d.dataType
        nullvalues = ["", " ", None]
        totalCount = 0
        
        # ITERATE THROUGH TABLE FIELDS.
        for i, lf in enumerate(lfields):

            field_type = str(lf.type)

            # IF COLUMN HEADER HAS INVALID CHARACTERS, REPLACE WITH UNDERSCORES.
            # KEEP ORIGINAL COLUMN NAME AND MAKE THE VALID NAME THE ALIAS NAME.
            excel_col_alias = lf.aliasName
            invalid_column_name = False
            for c in ("!", "@", "^", "+", "=", "|", ";", "<", ">", "~", "`", " ", "-", "(", ")", "#", "?", "%", "&", "*", "{", "}", "[", "]", ".", ":", "/", "\\"):
                if c in lf.name:
                    invalid_column_name = True
                    excel_col_alias = replace_non_alphanum_char_with_other(lf.name, "_")
            
            # START THE INSERT ROW WITH THE DATA COLLECTED SO FAR.
            this_line = []
            this_line.append(str(db))
            this_line.append(str(dtype))
            this_line.append(str(t))
            this_line.append(str(lf.name))
            this_line.append(field_type)
            this_line.append(float(lf.length))
            this_line.append(float(lf.precision))
            this_line.append(excel_col_alias)
            this_line.append(t.lower() + "_" + lf.name.lower())
            
            if (doCounts == True): # IF DOING COLUMN COUNT STATS...
            
                # SET INITIAL VARIABLES
                rowCount = 0
                hasZeros = False
                minNum = 0
                maxNum = 0
                colLen = 0
                pcPop = 0
                maxDate = None
                minDate = None
                is_double = True
                is_long = True
                column_exists = False
                first_string = None
                is_date = True
                is_guid = True
                
                # ITERATE THROUGH EACH ROW OF THE INPUT TABLE.
                with arcpy.da.SearchCursor(tab_path, "*") as s_cur:
                    for r in s_cur:
                        v = r[i] # REMEMBER WE ARE ALSO STILL ITERATING THROUGH 
                                 # EACH FIELD. SO, WE ARE COLLECTING THE VALUE 
                                 # FROM THE FIELD AT HAND.
                        # IF VALUE IS NOT NULL, MODIFY VARS AS APPROPRIATE.
                        if not(v in nullvalues):
                            column_exists = True
                            rowCount += 1
                            if lf.type in ("Integer", "Double", "OID"):
                                if v == 0:
                                    hasZeros = True
                                if minNum == 0 or v < minNum:
                                    minNum = v
                                if maxNum == 0 or v > maxNum:
                                    maxNum = v
                            if lf.type == "String":
                                if v == "0":
                                    hasZeros = True
                                if colLen == 0 or len(v) > colLen:
                                    colLen = len(v)
                            if lf.type == "Date":
                                if maxDate is None or v > maxDate:
                                    maxDate = v
                                if minDate is None or v < minDate:
                                    minDate = v
                            # THE EXCEL SECTION IS SO WE CAN SKIP EMPTY SHEETS,
                            # AS WELL AS IDENTIFY COLUMN TYPES MORE SPECIFICALLY
                            # THAN ARCGIS DOES OUT OF THE BOX.
                            if is_excel == True:
                                if lf.type == "String":
                                    if v == "0":
                                        hasZeros = True
                                    if string_is_number(v):
                                        if minNum == 0 or float(v) < minNum:
                                            minNum = float(v)
                                        if maxNum == 0 or float(v) > maxNum:
                                            maxNum = float(v)
                                        the_type = string_is_long_or_double(str(v))
                                        if the_type == "Double":
                                            is_long = False
                                            is_date = False
                                            is_guid = False
                                        if the_type == "Long":
                                            is_date = False
                                            is_guid = False
                                    else:
                                        if lf.name[0:1] == "F" and first_string is None and lf.name[1:2].isdigit() == True:
                                            first_string = str(v)
                                            excel_col_alias = replace_non_alphanum_char_with_other(str(v), "_")
                                            rowCount -= 1
                                        else:
                                            is_double = False
                                            is_long = False
                                            if v[0:1] == "{" and v[-1:] == "}":
                                                pass
                                            else:
                                                is_guid = False
                                            if string_is_date(v) == False:
                                                is_date = False
                                            if string_is_date(v) == True:
                                                if maxDate is None or v > maxDate:
                                                    maxDate = v
                                                if minDate is None or v < minDate:
                                                    minDate = v
                                elif lf.type == "Double":
                                    num_type = string_is_long_or_double(str(v))
                                    if num_type == "Double":
                                        is_long = False
                                    is_date = False
                                    is_guid = False
                                elif lf.type == "Date":
                                    is_long = False
                                    is_double = False
                                    is_guid = False
                                else:
                                    pass
                            else:
                                is_long = False
                                is_double = False
                                is_date = False
                                is_guid = False
                                        
                        if lf.type == "OID":
                            totalCount = rowCount
                            
                        if is_excel == True:
                            # USE THE HIGHEST COLUMN COUNT AS THE RECORD COUNT.
                            # THIS WILL NOT BE EXACT IF THERE ARE TITLES/TOTALS.
                            if rowCount > totalCount:
                                totalCount = rowCount
                                
                # APPEND THE COUNT STATS TO THE INSERT ROW.    
                this_line.append(float(rowCount))
                this_line.append(float(pcPop)) 
                this_line.append(str(hasZeros)) 
                this_line.append(float(colLen))
                this_line.append(float(minNum))
                this_line.append(float(maxNum)) 
                this_line.append(minDate)
                this_line.append(maxDate)
                
                # UPDATE THE EXCEL VALUES TO BE MORE SPECIFIC.
                # HAVE TO WAIT UNTIL THE END BECAUSE NEED TO ITERATE ENTIRE COLUMN.
                if is_excel == True:
                    if is_double == True:
                        this_line[4] = "Double"
                        this_line[5] = None
                        this_line[6] = None
                        this_line[12] = None
                    if is_long == True:
                        this_line[4] = "Long"
                        this_line[5] = None
                        this_line[6] = None
                        this_line[12] = None
                    if is_date == True:
                        this_line[4] = "Date"
                        this_line[5] = None
                        this_line[6] = None
                        this_line[12] = None
                    if is_guid == True:
                        this_line[4] = "GUID"
                        this_line[5] = None
                        this_line[6] = None
                        this_line[12] = None
                    if rowCount == 0:
                        this_line[4] = "String"
                        this_line[5] = None
                        this_line[6] = None
                        this_line[12] = None
                    if excel_col_alias is not None:
                        this_line[7] = excel_col_alias

                       
            if (doDomains == True): # IF DOING DOMAIN DATA...
                sDomain = ""
                sFlag = ""
                sType = ""
                sDType = ""
                # IF THIS FIELD IS ASSOCIATED WITH A DOMAIN AND THE DOMAIN
                # EXISTS IN THE PASSED DICTIONARY, APPEND THE INFO TO THE
                # INSERT ROW.
                if not(dom_dict is None):
                    if not(lf.domain is None):
                        sDomain = lf.domain
                        sFlag = "No"
                        if sDomain in dom_dict:
                            sType = dom_dict[sDomain][0]
                            sDType = dom_dict[sDomain][1]
                            sFlag = "Yes"
                this_line.append(sFlag) 
                this_line.append(str(sDomain)) 
                this_line.append(str(sType)) 
                this_line.append(str(sDType))
                
            if column_exists == True:
                rows_to_add.append(this_line)
                    
        # IF THE TABLE IS POPULATED, UPDATE THE PERCENTAGE POPULATED COLUMN FOR
        # EACH FIELD BASED ON THE FINAL RECORD COUNT.
        # USE 2 DECIMAL PLACES TO SUPPORT LARGE TABLES WITH ONLY A FEW RECS POPULATED.
        if totalCount == 0: # 9, 10
            pass
        else:
            for row in rows_to_add:
                row[10] = round((float(row[9])/float(totalCount))*100, 2)
                                        
        return rows_to_add
    except:
        tb = sys.exc_info()[2]
        tbinfo = traceback.format_tb(tb)[0]
        pymsg = tbinfo + "\n" + str(sys.exc_type)+ ": " + str(sys.exc_value)
        arcpy.AddError("Python Messages: " + pymsg + " GP Messages: " + arcpy.GetMessages(2))
        
if __name__ == "__main__":

    in_workspace = arcpy.GetParameterAsText(0) # WORKSPACE
    incl_domains = arcpy.GetParameterAsText(1) # BOOLEAN, DEFAULT True
    incl_counts = arcpy.GetParameterAsText(2) # BOOLEAN, DEFAULT True
    prefix_chosen = arcpy.GetParameterAsText(3) # STRING, DEFAULT Other
    out_workspace = arcpy.GetParameterAsText(4) # WORKSPACE
    out_table_name = arcpy.GetParameterAsText(5) # STRING
    
    # CREATE COLUMN NAMES
    if prefix_chosen == "For ETL Source":
        n_prefix = "source_"
        s_prefix = "src_"
        n_suffix = ""
    elif prefix_chosen == "For ETL Target":
        n_prefix = "target_"
        s_prefix = "targ_"
        n_suffix = ""
    else:
        n_prefix = ""
        s_prefix = ""
        n_suffix = "_name"
    w_name = n_prefix + "workspace"
    d_type = n_prefix + "data_type"
    tab_name = n_prefix + "table" + n_suffix
    col_name = n_prefix + "column" + n_suffix
    col_type = s_prefix + "col_type"
    col_width = s_prefix + "col_width"
    col_prec = s_prefix + "col_prec"
    col_alias = s_prefix + "col_alias"
    col_count = s_prefix + "col_cnt"
    col_pc = s_prefix + "col_pc_pop"
    contains_zeros = s_prefix + "col_has_zeros"
    max_len = s_prefix + "col_len"
    min_val = s_prefix + "col_min"
    max_val = s_prefix + "col_max"
    min_date_col = s_prefix + "date_min"
    max_date_col = s_prefix + "date_max"
    dom_flag = s_prefix + "domain_flag"
    col_domain = n_prefix + "domain_name"
    dom_col_type = s_prefix + "domain_col_type"
    col_join = s_prefix + "join_field"
    dom_typ_col = s_prefix + "domain_type"
    
    try:
    
        arcpy.AddMessage(" ")
        
        # GET OBJECTS PRESENT IN THE INPUT WORKSPACE
        arcpy.env.workspace = in_workspace
        
        # COLLECT FEATURE CLASSES FROM WORKSPACE TREE
        FCs = []
        walk = arcpy.da.Walk(in_workspace, datatype='FeatureClass')
        for dirpath, dirnames, filenames in walk:
            for filename in filenames:
                FCs.append(os.path.join(dirpath, filename))
                
        # COLLECT TABLES FROM WORKSPACE TREE
        tables = []
        walk = arcpy.da.Walk(in_workspace, datatype='Table')
        for dirpath, dirnames, filenames in walk:
            for filename in filenames:
                tables.append(os.path.join(dirpath, filename))

        # CREATE OUTPUT TABLE
        out_table = arcpy.CreateTable_management (out_workspace, out_table_name)
    
        # CREATE THE COLUMN HEADINGS LIST
        #                  0       1        2         3        4          5          6         7          8
        tableheaders = [w_name, d_type, tab_name, col_name, col_type, col_width, col_prec, col_alias, col_join]
        
        # IF USER SPECIFIED INCLUDING COLUMN COUNT STATISTICS, ADD THOSE COLUMN HEADINGS
        if (str(incl_counts).lower() == "true"):
            doCounts = True
            tableheaders.append(col_count)       #  9
            tableheaders.append(col_pc)          # 10
            tableheaders.append(contains_zeros)  # 11
            tableheaders.append(max_len)         # 12
            tableheaders.append(min_val)         # 13
            tableheaders.append(max_val)         # 14
            tableheaders.append(min_date_col)    # 15
            tableheaders.append(max_date_col)    # 16
        else:
            doCounts = False
            
        # IF USER SPECIFIED INCLUDING DOMAIN STATISTICS, ADD THOSE COLUMN HEADINGS
        if (str(incl_domains).lower() == "true"):
            doDomains = True
            tableheaders.append(dom_flag)       # 17
            tableheaders.append(col_domain)     # 18
            tableheaders.append(dom_col_type)   # 19
            tableheaders.append(dom_typ_col)    # 20
        else:
            doDomains = False
            
        # ADD THE COLUMN HEADINGS TO THE OUTPUT TABLE
        add_fields(out_table, tableheaders)
        
        # GET THE DOMAIN INFO FROM THE GEODATABASE
        # (IF THIS IS NOT A GEODATABASE, THIS FUNCTION WILL RETURN NONE)
        if doDomains == True:
            dom_dict = dict_domainname_domaincolumntype(in_workspace)
        else:
            dom_dict = ""
        
        arcpy.AddMessage("     Writing workspace schema to file...please wait...")
        
        # LOOP THROUGH FEATURE CLASSES, WRITING SCHEMA TO FILE
        for x in FCs:
            try:
                arcpy.env.workspace = in_workspace
                desc = arcpy.Describe(x)
                x_path = desc.catalogPath
                rows1 = table_schema_to_file(x_path, doDomains, doCounts, dom_dict)
                arcpy.AddMessage("     Inserting " + str(len(rows1)) + " rows for FC " + str(x) + "...")
                insert_rows(out_table, tableheaders, rows1)
                del rows1
            except:
                tb = sys.exc_info()[2]
                tbinfo = traceback.format_tb(tb)[0]
                pymsg = tbinfo + "\n" + str(sys.exc_type)+ ": " + str(sys.exc_value)
                arcpy.AddError("Python Messages: " + pymsg + " GP Messages: " + arcpy.GetMessages(2))
                continue
        # LOOP THROUGH TABLES, WRITING SCHEMA TO FILE
        for x in tables:
            try:
                rows2 = []
                arcpy.env.workspace = in_workspace
                desc = arcpy.Describe(x)
                x_path = desc.catalogPath
                rows2 = table_schema_to_file(x_path, doDomains, doCounts, dom_dict)
                arcpy.AddMessage("     Inserting " + str(len(rows2)) + " rows for Table " + str(x) + " ...")
                insert_rows(out_table, tableheaders, rows2)
                del rows2
            except:
                arcpy.AddMessage("ERROR: " + str(x))
                arcpy.AddMessage("ERROR: " + str(rows2))
                tb = sys.exc_info()[2]
                tbinfo = traceback.format_tb(tb)[0]
                pymsg = tbinfo + "\n" + str(sys.exc_type)+ ": " + str(sys.exc_value)
                arcpy.AddError("Python Messages: " + pymsg + " GP Messages: " + arcpy.GetMessages(2))
                continue
                
        # OUTPUT PARAMETER
        arcpy.SetParameter(6, out_table) # TABLEVIEW
        
        arcpy.AddMessage(" ")
    except:
        tb = sys.exc_info()[2]
        tbinfo = traceback.format_tb(tb)[0]
        pymsg = tbinfo + "\n" + str(sys.exc_type)+ ": " + str(sys.exc_value)
        arcpy.AddError("Python Messages: " + pymsg + " GP Messages: " + arcpy.GetMessages(2))
    finally:
        del in_workspace, incl_domains, incl_counts, out_workspace, out_table_name, prefix_chosen
        del out_table
        del tableheaders, dom_dict
        del tables, FCs

Tags

Workspace, Dictionary, InsertCursor, ListFields, Enumerate, SearchCursor, Excel, Domain, Walk, Describe, CreateTable, Schema, Table

Credits

Ruth Bowers, 2015, 2016

Use limitations

Tested in ArcGIS 10.2 as ArcGIS custom toolbox script tool. Tested only using all parameters including optional parameters. May not function properly without including column stats and domain data.