GIS From The Woods: Domains: List Distinct Values

Do you work on the IS side of GIS, in ESRI geodatabases? The day may come when you are faced with culling domain values from what seems like a gazillion tables, organizing and verifying the values, and loading them into multiple domains.

Sorting through domain values by visually inspecting each column and loading domains one by one can get tiresome. It may be that you decide that the best thing to do is to learn how to script and automate much of the domain value sorting, validating and loading process.

Whether you are just starting out and want some tips, or whether you don't like scripting and just want something that works--this blog is for you.

This series of posts on domains will put you on your way. Each post describes a python script tool that can be added to an ArcGIS custom toolbox. Collectively, the tools automate the domain workflow, reduce errors, and save time.

Summary Of Tool

The List Distinct Values tool outputs a list of the unique values contained in each input column to a table. The output table columns are as follows:

Domain (Domain name--left blank for this tool)
DomainType (Type of domain, coded, range, etc.--left blank for this tool)
Code (Domain value)
Description (Domain description)
RangeMin (Range domains only--minimum value--left blank for this tool)
RangeMax (Range domains only--maximum value--left blank for this tool)
DomainColType (Domain column type, Double, Long, Text, etc.--based on field type)
TableName (Table name)
ColumnName (Column name)

The output table schema matches the table schema used in the Batch Domain Loading tool, as well as the Export Geodatabase Domains tool.

Database types supported are: ESRI personal, file, and SDE databases, Oracle and SQL Server remote/local databases, shapefiles, stand-alone tables supported by ArcGIS.

The output from this tool can be used for verifying and editing domain values, for domain documentation, as well as for a domain loading table. It can also be used for non-domain value clean-up.

Illustration

Usage

There is no usage for this tool.

Syntax

ListDistinctValues_mytools (Input_Table, Input_Columns, Output_Workspace, Output_Table_Name)

Parameter	Explanation	Data Type
Input_Table	Dialog Reference Input table ___________________ Python Reference Input table	Table View
Input_Columns	Dialog Reference Input columns ___________________ Python Reference Input columns	Multiple Value
Output_Workspace	Dialog Reference Output workspace ___________________ Python Reference Output workspace	Workspace
Output_Table_Name	Dialog Reference Output table name ___________________ Python Reference Output table name	String

Code Samples

ListDistinctValues.py

See below for tool source code.

import arcpy, traceback, sys, os
arcpy.env.overwriteOutput = True

def add_fields(Input_Layer, field_name_list):
    '''ADDS FIELDS TO A TABLE. USES A VALUE TABLE TO COLLECT FIELD PROPERTIES. 
    The example below shows one way to manage field names for easier coding.
    Create the field name list in your main code, then paste below for reference.
        # NAMES FOR FIELDS TO ADD
        col_name = "Domain"
        col_type = "DomainType"
        col_code = "Code"
        col_desc = "Description"
        col_min = "RangeMin"
        col_max = "RangeMax"
        col_ctype = "DomainColType"
        col_tab = "TableName"
        col_col = "ColumnName"
    Create table syntax:  CreateTable_management (out_path, out_name, {template}, {config_keyword})
    '''
    # CREATE VALUE TABLE FOR FIELD PROPERTIES
    # ADDROW FORMAT: Name Type Length - Length is optional
    vtab = arcpy.ValueTable(3)
    vtab.addRow(field_name_list[0] + " text 255")
    vtab.addRow(field_name_list[1] + " text 15") 
    vtab.addRow(field_name_list[2] + " text 100") 
    vtab.addRow(field_name_list[3] + " text 100") 
    vtab.addRow(field_name_list[4] + " Double") 
    vtab.addRow(field_name_list[5] + " Double") 
    vtab.addRow(field_name_list[6] + " text 15") 
    vtab.addRow(field_name_list[7] + " text 255") 
    vtab.addRow(field_name_list[8] + " text 255") 
    
    x = 0
    # ITERATE THE VALUE TABLE, ADDING FIELDS TO THE INPUT TABLE
    while x < vtab.rowCount:
        if (vtab.getValue(x, 1) == "text"):
            arcpy.AddField_management(Input_Layer, vtab.getValue(x, 0), vtab.getValue(x, 1), field_length=(int(vtab.getValue(x, 2))))
        else:
            arcpy.AddField_management(Input_Layer, vtab.getValue(x, 0), vtab.getValue(x, 1))
        arcpy.AddMessage("          Added field " + vtab.getValue(x, 0) + ".")
        x += 1

def fieldname_list_from_layer(in_layer):
    # RETURNS A PYTHON LIST OF FIELD NAME STRINGS FROM AN INPUT FEATURE LAYER
    desc = arcpy.Describe(in_layer)
    l_path = desc.catalogPath
    flds = arcpy.ListFields(l_path, "*")
    return flds
    
def insert_rows(in_layer, field_list, rows_to_insert):
    # USES DA.INSERTCURSOR TO INSERT ROWS OF DATA INTO INPUT TABLE.
    # PARAMETERS: INPUT LAYER, LIST OF FIELD NAME STRINGS, LIST OF DATA ROW LISTS.
    cursor = arcpy.da.InsertCursor(in_layer, field_list)
    for row in rows_to_insert:
        cursor.insertRow(row)
    del cursor
    
def workspace_from_fc(in_feature_class):
    # RETURNS WORKSPACE NAME FROM FEATURE CLASS PATH OR TABLEVIEW LAYER
    wksp = os.path.dirname(in_feature_class)
    desc = arcpy.Describe(wksp)
    if hasattr(desc, "datasetType") and desc.datasetType=='FeatureDataset':
        wksp = os.path.dirname(wksp)    
    return wksp

def get_basename_from_layername(in_layer):
    # RETURNS BASE NAME FROM LAYER NAME.
    # RETURNS TABLE NAME OR FEATURE CLASS NAME.
    desc_1 = arcpy.Describe(in_layer)
    cat_path = desc_1.catalogPath
    desc_2 = arcpy.Describe(cat_path)
    t = desc_1.baseName
    return t

def get_distinct_values(in_layer, in_col):
    # RETURNS A LIST OF THE UNIQUE VALUES CONTAINED IN THE INPUT COLUMN.
    l = [row[0] for row in arcpy.da.SearchCursor(in_layer, in_col)]
    list_of_distinct_values = set(l)
    return list_of_distinct_values
    
def get_field_type(in_layer, in_field):
    # RETURNS ARCGIS STANDARD FIELD TYPE FOR INPUT FIELD.
    # CONVERTS LISTFIELDS FIELD TYPE TO ARCGIS STANDARD FIELD TYPE.
    desc = arcpy.Describe(in_layer)
    l_path = desc.catalogPath
    flds = arcpy.ListFields(l_path, "*")
    ftype = ""
    for f in flds:
        if f.name.lower()==in_field.lower():
            ftype = f.type
    if ftype == "String":
        ftype = "Text"
    elif ftype == "Integer":
        ftype = "Long Integer"
    elif ftype == "Single":
        ftype = "Float"
    elif ftype == "SmallInteger":
        ftype = "Short Integer"
    else:
        pass
    return ftype

if __name__ == "__main__":

    in_table = arcpy.GetParameterAsText(0) # TABLEVIEW
    in_columns = arcpy.GetParameterAsText(1).split(";") # FIELD, MULTIPLE
    out_workspace = arcpy.GetParameterAsText(2) # WORKSPACE
    out_table_name = arcpy.GetParameterAsText(3) # STRING

    try:

        arcpy.AddMessage(" ")
        
        # CREATE INTERMEDIATE OUTPUT TABLE
        arcpy.AddMessage("     Creating output table...")
        arcpy.Delete_management("in_memory")
        mem_table = arcpy.CreateTable_management ("in_memory", out_table_name)
    
        # CREATE THE COLUMN HEADINGS LIST
        col_name = "Domain"
        col_type = "DomainType"
        col_code = "CODE"
        col_desc = "DESCRIPTION"
        col_min = "RangeMin"
        col_max = "RangeMax"
        col_ctype = "DomainColType"
        col_tab = "TableName"
        col_col = "ColumnName"
        field_names = [col_name, col_type, col_code, col_desc, col_min, col_max, col_ctype, col_tab, col_col]

        # ADD THE COLUMN HEADINGS TO THE OUTPUT TABLE
        add_fields(mem_table, field_names)
        
        # CREATE LIST TO HOLD ROWS OF DATA TO INSERT INTO OUTPUT TABLE
        rows_to_insert = []

        this_line = []
        val_list = []
        tabname = get_basename_from_layername(in_table)
        arcpy.AddMessage("     Collecting values from table " + in_table + "...")
        # ITERATE INPUT FIELDS.
        for fld in in_columns:
            # GET FIELD TYPE AND LIST OF UNIQUE VALUES FOR THIS COLUMN.
            sDomain = ""
            sType = get_field_type(in_table, fld)
            sDType = ""
            val_list = get_distinct_values(in_table, fld)
            # ITERATE VALUES, CREATING INSERT ROW LIST FOR THIS FIELD
            # AND APPEND IT TO THE LIST OF ROWS TO INSERT
            for v in val_list:
                this_line = [] # START LIST OF VALUES FOR THIS ROW.
                this_line.append(None) # SKIP DOMAIN NAME SINCE IT IS UNKNOWN.
                this_line.append(None) # SKIP DOMAIN TYPE SINCE IT IS UNKNOWN.
                if v is None: # FOR NULL, CODE="NULL" AND DESCRIPTION IS BLANK.
                    this_line.append("NULL")
                    this_line.append(None)
                else: # VALUE NOT NULL, POPULATE CODE AND DESCRIPTION WITH VALUE.
                    this_line.append(str(v))
                    this_line.append(str(v))
                this_line.append(None) # SKIP RANGE MIN SINCE IT IS UNKNOWN.
                this_line.append(None) # SKIP RANGE MAX SINCE IT IS UNKNOWN.
                this_line.append(sType) # FIELD TYPE
                this_line.append(tabname.upper()) # TABLE NAME IN UPPER CASE
                this_line.append(fld.upper()) # FIELD NAME IN UPPER CASE
                rows_to_insert.append(this_line)
            
        # POPULATE OUTPUT TABLE AND SAVE TO DISK
        arcpy.AddMessage("     Creating output table...")
        insert_rows(mem_table, field_names, rows_to_insert)
        out_table = arcpy.TableToTable_conversion(mem_table, out_workspace, out_table_name)

        # OUTPUT PARAMETER
        arcpy.SetParameter(4, out_table) # TABLEVIEW
                
        arcpy.AddMessage(" ")
               
    except:

        # PRINT ERROR MESSAGES
        tb = sys.exc_info()[2]
        tbinfo = traceback.format_tb(tb)[0]
        pymsg = tbinfo + "\n" + str(sys.exc_type)+ ": " + str(sys.exc_value)
        arcpy.AddError("Python Messages: " + pymsg + " GP Messages: " + arcpy.GetMessages(2))
        
    finally:
        del in_table, out_workspace, out_table_name
        del out_table
        del rows_to_insert, this_line
        del val_list
        arcpy.Delete_management("in_memory")

Credits

Ruth Bowers, 2016

Use limitations

Tested in ArcGIS 10.2.2 as python script tool in custom toolbox. Not tested on Excel and text files.

GIS From The Woods

Sunday, April 17, 2016

Domains: List Distinct Values

Summary Of Tool

Illustration

Usage

Syntax

Code Samples

Tags

Credits

Use limitations

No comments:

Post a Comment

Blog Archive