Summary
This tool uses
arcpy.da.Walk and Describe to export the schema of a workspace to a table.
Each table and column will output to one row.
This tool is designed for feature class and tabular data. Workspace types supported:
personal geodatabase; file geodatabase;
enterprise geodatabase; feature datasets; remote databases
(non-geodatabase); and folders. Feature
class and table types supported: geodatabase and shapefile feature classes; stand-alone geodatabase tables; Oracle tables; SQL Server
tables; dBase files; text files; CSV files; and Excel spreadsheets
(single-table tabs and named ranges).
This tool's output is very detailed, and the tool may take some time to run.
This is especially true on large remote databases (testing ranged from a
few minutes to about an hour). For a record count of tables/feature
classes without as much detail, use the List Record Counts tool. For a brief list of workspace contents, use the List Workspace Contents tool.
The default fields included are:
- workspace (Workspace)
- data_type (Data type per arcpy.Describe)
- table_name (Table name)
- col_name (Column name)
- col_type (Column type)
- col_width (Column width)
- col_prec (Column precision)
- col_alias (Column alias)
- join_field (table_name and column_name concatenated with an underscore)
The optional columns included with column statistics are:
- col_cnt (count of rows populated),
- col_pc_pop (percentage of rows populated),
- col_has_zeros (contains zeros--includes text columns and numeric columns),
- col_len (text columns only--maximum length in column),
- col_min (numeric columns only--minimum value in column)
- col_max (numeric columns only--maximum value in column)
- date_min (date columns only--earliest date in column)
- date_max (date columns only--most recent date in column)
The optional columns included with domain data are:
- domain_flag (Y/N),
- domain_name (Domain name associated with column, if any),
- domain_type (Domain type, coded, range, etc.)
- domain_column_type (Domain column type, Text, Double, Long, etc).
The
user also chooses between three column heading formats: For ETL Source;
For ETL Target; and Other. Respectively, these result in column
prefixes of "Source_" or "Src_", "Target_" or "Targ_", or no prefixing.
For example, again respectively, the table name column heading would be
"source_table", "target_table", or "table_name".
Note that Excel record counts and column types will be approximate due to ArcGIS limitations.
The
output of this tool can be used to quickly assess and QC a database or
group of shapefiles/tables, as well as longer-term QC work. For example,
you can quickly select empty or partially empty columns and tables,
look at data ranges, as well as have domain names, types, column types,
and number ranges in one table. This output can also be used as a
starting point in scripting because the workspace and table concatenate
to a valid path.
Illustration
Usage
There is no usage for this tool.
Syntax
WorkspaceSchemaToTable_mytools
(Input_Workspace, {Include_Domains}, {Include_Counts}, Column_Headings,
Output_Workspace, Output_Table_Name)
Parameter | Explanation | Data Type |
---|---|---|
Input_Workspace |
Dialog Reference
Input workspace
___________________
Python Reference
Input workspace
|
Workspace |
Include_Domains (Optional) |
Dialog Reference
Check to include a column with the name of the assigned domain, if any.
___________________
Python Reference
Check to include a column with the name of the assigned domain, if any.
|
Boolean |
Include_Counts (Optional) |
Dialog Reference
Check to include columns with various stats for this field.
___________________
Python Reference
Check to include columns with various stats for this field.
|
Boolean |
Column_Headings |
Dialog Reference
Choose between 3 column heading formats: Source, Target, Other.
___________________
Python Reference
Choose between 3 column heading formats: Source, Target, Other.
|
String |
Output_Workspace |
Dialog Reference
Output workspace.
___________________
Python Reference
Output workspace.
|
Workspace |
Output_Table_Name |
Dialog Reference
Output table name
___________________
Python Reference
Output table name
|
String |
Code Samples
WorkspaceSchema2Table.py
See below for tool source code.
import arcpy, traceback, sys, os, time from dateutil.parser import parse arcpy.env.overwriteOutput = True def add_fields(Input_Layer, field_name_list): '''Quick, hard-coded way to add fields to a table, example below shows one way to manage field names for easier coding. Create the field name list in your main code, then paste below for reference. # NAMES FOR FIELDS TO ADD # 0 1 2 3 4 5 6 7 8 tableheaders = [w_name, d_type, tab_name, col_name, col_type, col_width, col_prec, col_alias, col_join] - 9 cols doCounts = True tableheaders.append(col_count) # 9 - 8 cols tableheaders.append(col_pc) # 10 tableheaders.append(contains_zeros) # 11 tableheaders.append(max_len) # 12 tableheaders.append(min_val) # 13 tableheaders.append(max_val) # 14 tableheaders.append(min_date_col) # 15 tableheaders.append(max_date_col) # 16 doDomains = True tableheaders.append(dom_flag) # 17 - 4 cols tableheaders.append(col_domain) # 18 tableheaders.append(dom_col_type) # 19 tableheaders.append(dom_typ_col) # 20 Create table syntax: CreateTable_management (out_path, out_name, {template}, {config_keyword}) ''' # CREATE VALUE TABLE FOR FIELDS TO ADD # ADDROW FORMAT: Name Type Length - Length is optional vtab = arcpy.ValueTable(3) vtab.addRow(field_name_list[0] + " text 255") vtab.addRow(field_name_list[1] + " text 15") vtab.addRow(field_name_list[2] + " text 100") vtab.addRow(field_name_list[3] + " text 100") vtab.addRow(field_name_list[4] + " text 15") vtab.addRow(field_name_list[5] + " Double") vtab.addRow(field_name_list[6] + " Double") vtab.addRow(field_name_list[7] + " text 50") vtab.addRow(field_name_list[8] + " text 255") if len(field_name_list) in (17, 21): vtab.addRow(field_name_list[9] + " Double") vtab.addRow(field_name_list[10] + " Double") vtab.addRow(field_name_list[11] + " text 5") vtab.addRow(field_name_list[12] + " Double") vtab.addRow(field_name_list[13] + " Double") vtab.addRow(field_name_list[14] + " Double") vtab.addRow(field_name_list[15] + " date") vtab.addRow(field_name_list[16] + " date") if len(field_name_list) == 21: vtab.addRow(field_name_list[17] + " text 5") vtab.addRow(field_name_list[18] + " text 100") vtab.addRow(field_name_list[19] + " text 15") vtab.addRow(field_name_list[20] + " text 15") if len(field_name_list) == 13: vtab.addRow(field_name_list[9] + " text 5") vtab.addRow(field_name_list[10] + " text 100") vtab.addRow(field_name_list[11] + " text 15") vtab.addRow(field_name_list[12] + " text 15") x = 0 # LOOP THROUGH THE VALUE TABLE, ADDING FIELDS while x < vtab.rowCount: if (vtab.getValue(x, 1) == "text"): arcpy.AddField_management(Input_Layer, vtab.getValue(x, 0), vtab.getValue(x, 1), field_length=(int(vtab.getValue(x, 2)))) else: arcpy.AddField_management(Input_Layer, vtab.getValue(x, 0), vtab.getValue(x, 1)) arcpy.AddMessage(" Added field " + vtab.getValue(x, 0) + ".") x += 1 def dict_domainname_domaincolumntype(input_wksp): # CREATE DICTIONARY FROM CODED VALUE DOMAINS; DOMAIN NAME IS KEY, DOMAIN VALUES ARE LIST VALUE. # RETURNS NONE IF WORKSPACE TYPE IS FILE SYSTEM. desc = arcpy.Describe(input_wksp) wksp_type = desc.workspaceType if wksp_type == 'FileSystem': d = None else: d = {} doms = arcpy.da.ListDomains(input_wksp) for dom in doms: d[dom.name] = [dom.type, dom.domainType] return d def insert_rows(in_layer, field_list, rows_to_insert): # INSERTS A PYTHON LIST OF LISTS CONTAINING COMMA-DELIMITED ROWS OF DATA TO INSERT. # PARAMETERS: LAYER TO INSERT ROWS INTO, LIST OF FIELD NAME STRINGS, DATA ROWS. cursor = arcpy.da.InsertCursor(in_layer, field_list) for row in rows_to_insert: cursor.insertRow(row) del cursor def replace_non_alphanum_char_with_other(in_string, replace_string): # REPLACES NON-ALPHANUMERIC CHARACTERS WITH A REPLACE STRING. # FOR EXAMPLE, YOU COULD REPLACE ALL THE NON-ALPHANUMERIC CHARACTERS WITH A SINGLE UNDERSCORE. # ex: change 1001 - Deep Fork Gathering to 1001_Deep_Fork_Gathering s = in_string for c in ("!", "@", "^", "+", "=", "|", ";", "<", ">", "~", "`", " ", "-", "(", ")", "#", "?", "%", "&", "*", "{", "}", "[", "]", ".", ":", "\\", "/"): if c in s: s = s.replace(c, replace_string) s = s.replace(replace_string + replace_string, replace_string).replace(replace_string + replace_string, replace_string).replace(replace_string + replace_string, replace_string).replace(replace_string + replace_string, replace_string).replace(replace_string + replace_string, replace_string) return s def string_is_date(string): # DETERMINES IF A STRING COULD BE A DATE # from dateutil.parser import parse try: parse(string) return True except ValueError: return False def string_is_number(s): # CHECKS TO SEE IF STRING IS NUMERIC. BOOLEAN. try: float(s) return True except ValueError: return False def string_is_long_or_double(s): ''' RETURNS "LONG" OR "DOUBLE" DEPENDING ON NUMERIC DATA TYPE THAT STRING CONVERTS TO. ASSUMES STRING HAS BEEN VERIFIED AS NUMERIC.''' if round(float(s), 0)//float(s) != 1: return "Double" else: return "Long" def get_distinct_values(in_layer, in_col): # RETURN A UNIQUE LIST OF VALUES FROM THE SPECIFIED COLUMN l = [row[0] for row in arcpy.da.SearchCursor(in_layer, in_col)] list_of_distinct_values = set(l) return list_of_distinct_values def table_schema_to_file(tab_path, doDomains, doCounts, dom_dict): # COLLECTS TABLE SCHEMA AND COMPILES INSERT ROWS try: # FIRST, DESCRIBE TABLE AND COLLECT FIELDS. # ALSO SET INITIAL VARIABLES # AS WELL A LIST OF WHAT NULL MEANS AND WHETHER TABLE IS EXCEL. rows_to_add = [] lfields=arcpy.ListFields(tab_path) d = arcpy.Describe(tab_path) t = d.baseName db = d.path is_excel = False if str(db)[-4:] in [".xls", "xlsx"]: t = t.strip("'") is_excel = True dtype = d.dataType nullvalues = ["", " ", None] totalCount = 0 # ITERATE THROUGH TABLE FIELDS. for i, lf in enumerate(lfields): field_type = str(lf.type) # IF COLUMN HEADER HAS INVALID CHARACTERS, REPLACE WITH UNDERSCORES. # KEEP ORIGINAL COLUMN NAME AND MAKE THE VALID NAME THE ALIAS NAME. excel_col_alias = lf.aliasName invalid_column_name = False for c in ("!", "@", "^", "+", "=", "|", ";", "<", ">", "~", "`", " ", "-", "(", ")", "#", "?", "%", "&", "*", "{", "}", "[", "]", ".", ":", "/", "\\"): if c in lf.name: invalid_column_name = True excel_col_alias = replace_non_alphanum_char_with_other(lf.name, "_") # START THE INSERT ROW WITH THE DATA COLLECTED SO FAR. this_line = [] this_line.append(str(db)) this_line.append(str(dtype)) this_line.append(str(t)) this_line.append(str(lf.name)) this_line.append(field_type) this_line.append(float(lf.length)) this_line.append(float(lf.precision)) this_line.append(excel_col_alias) this_line.append(t.lower() + "_" + lf.name.lower()) if (doCounts == True): # IF DOING COLUMN COUNT STATS... # SET INITIAL VARIABLES rowCount = 0 hasZeros = False minNum = 0 maxNum = 0 colLen = 0 pcPop = 0 maxDate = None minDate = None is_double = True is_long = True column_exists = False first_string = None is_date = True is_guid = True # ITERATE THROUGH EACH ROW OF THE INPUT TABLE. with arcpy.da.SearchCursor(tab_path, "*") as s_cur: for r in s_cur: v = r[i] # REMEMBER WE ARE ALSO STILL ITERATING THROUGH # EACH FIELD. SO, WE ARE COLLECTING THE VALUE # FROM THE FIELD AT HAND. # IF VALUE IS NOT NULL, MODIFY VARS AS APPROPRIATE. if not(v in nullvalues): column_exists = True rowCount += 1 if lf.type in ("Integer", "Double", "OID"): if v == 0: hasZeros = True if minNum == 0 or v < minNum: minNum = v if maxNum == 0 or v > maxNum: maxNum = v if lf.type == "String": if v == "0": hasZeros = True if colLen == 0 or len(v) > colLen: colLen = len(v) if lf.type == "Date": if maxDate is None or v > maxDate: maxDate = v if minDate is None or v < minDate: minDate = v # THE EXCEL SECTION IS SO WE CAN SKIP EMPTY SHEETS, # AS WELL AS IDENTIFY COLUMN TYPES MORE SPECIFICALLY # THAN ARCGIS DOES OUT OF THE BOX. if is_excel == True: if lf.type == "String": if v == "0": hasZeros = True if string_is_number(v): if minNum == 0 or float(v) < minNum: minNum = float(v) if maxNum == 0 or float(v) > maxNum: maxNum = float(v) the_type = string_is_long_or_double(str(v)) if the_type == "Double": is_long = False is_date = False is_guid = False if the_type == "Long": is_date = False is_guid = False else: if lf.name[0:1] == "F" and first_string is None and lf.name[1:2].isdigit() == True: first_string = str(v) excel_col_alias = replace_non_alphanum_char_with_other(str(v), "_") rowCount -= 1 else: is_double = False is_long = False if v[0:1] == "{" and v[-1:] == "}": pass else: is_guid = False if string_is_date(v) == False: is_date = False if string_is_date(v) == True: if maxDate is None or v > maxDate: maxDate = v if minDate is None or v < minDate: minDate = v elif lf.type == "Double": num_type = string_is_long_or_double(str(v)) if num_type == "Double": is_long = False is_date = False is_guid = False elif lf.type == "Date": is_long = False is_double = False is_guid = False else: pass else: is_long = False is_double = False is_date = False is_guid = False if lf.type == "OID": totalCount = rowCount if is_excel == True: # USE THE HIGHEST COLUMN COUNT AS THE RECORD COUNT. # THIS WILL NOT BE EXACT IF THERE ARE TITLES/TOTALS. if rowCount > totalCount: totalCount = rowCount # APPEND THE COUNT STATS TO THE INSERT ROW. this_line.append(float(rowCount)) this_line.append(float(pcPop)) this_line.append(str(hasZeros)) this_line.append(float(colLen)) this_line.append(float(minNum)) this_line.append(float(maxNum)) this_line.append(minDate) this_line.append(maxDate) # UPDATE THE EXCEL VALUES TO BE MORE SPECIFIC. # HAVE TO WAIT UNTIL THE END BECAUSE NEED TO ITERATE ENTIRE COLUMN. if is_excel == True: if is_double == True: this_line[4] = "Double" this_line[5] = None this_line[6] = None this_line[12] = None if is_long == True: this_line[4] = "Long" this_line[5] = None this_line[6] = None this_line[12] = None if is_date == True: this_line[4] = "Date" this_line[5] = None this_line[6] = None this_line[12] = None if is_guid == True: this_line[4] = "GUID" this_line[5] = None this_line[6] = None this_line[12] = None if rowCount == 0: this_line[4] = "String" this_line[5] = None this_line[6] = None this_line[12] = None if excel_col_alias is not None: this_line[7] = excel_col_alias if (doDomains == True): # IF DOING DOMAIN DATA... sDomain = "" sFlag = "" sType = "" sDType = "" # IF THIS FIELD IS ASSOCIATED WITH A DOMAIN AND THE DOMAIN # EXISTS IN THE PASSED DICTIONARY, APPEND THE INFO TO THE # INSERT ROW. if not(dom_dict is None): if not(lf.domain is None): sDomain = lf.domain sFlag = "No" if sDomain in dom_dict: sType = dom_dict[sDomain][0] sDType = dom_dict[sDomain][1] sFlag = "Yes" this_line.append(sFlag) this_line.append(str(sDomain)) this_line.append(str(sType)) this_line.append(str(sDType)) if column_exists == True: rows_to_add.append(this_line) # IF THE TABLE IS POPULATED, UPDATE THE PERCENTAGE POPULATED COLUMN FOR # EACH FIELD BASED ON THE FINAL RECORD COUNT. # USE 2 DECIMAL PLACES TO SUPPORT LARGE TABLES WITH ONLY A FEW RECS POPULATED. if totalCount == 0: # 9, 10 pass else: for row in rows_to_add: row[10] = round((float(row[9])/float(totalCount))*100, 2) return rows_to_add except: tb = sys.exc_info()[2] tbinfo = traceback.format_tb(tb)[0] pymsg = tbinfo + "\n" + str(sys.exc_type)+ ": " + str(sys.exc_value) arcpy.AddError("Python Messages: " + pymsg + " GP Messages: " + arcpy.GetMessages(2)) if __name__ == "__main__": in_workspace = arcpy.GetParameterAsText(0) # WORKSPACE incl_domains = arcpy.GetParameterAsText(1) # BOOLEAN, DEFAULT True incl_counts = arcpy.GetParameterAsText(2) # BOOLEAN, DEFAULT True prefix_chosen = arcpy.GetParameterAsText(3) # STRING, DEFAULT Other out_workspace = arcpy.GetParameterAsText(4) # WORKSPACE out_table_name = arcpy.GetParameterAsText(5) # STRING # CREATE COLUMN NAMES if prefix_chosen == "For ETL Source": n_prefix = "source_" s_prefix = "src_" n_suffix = "" elif prefix_chosen == "For ETL Target": n_prefix = "target_" s_prefix = "targ_" n_suffix = "" else: n_prefix = "" s_prefix = "" n_suffix = "_name" w_name = n_prefix + "workspace" d_type = n_prefix + "data_type" tab_name = n_prefix + "table" + n_suffix col_name = n_prefix + "column" + n_suffix col_type = s_prefix + "col_type" col_width = s_prefix + "col_width" col_prec = s_prefix + "col_prec" col_alias = s_prefix + "col_alias" col_count = s_prefix + "col_cnt" col_pc = s_prefix + "col_pc_pop" contains_zeros = s_prefix + "col_has_zeros" max_len = s_prefix + "col_len" min_val = s_prefix + "col_min" max_val = s_prefix + "col_max" min_date_col = s_prefix + "date_min" max_date_col = s_prefix + "date_max" dom_flag = s_prefix + "domain_flag" col_domain = n_prefix + "domain_name" dom_col_type = s_prefix + "domain_col_type" col_join = s_prefix + "join_field" dom_typ_col = s_prefix + "domain_type" try: arcpy.AddMessage(" ") # GET OBJECTS PRESENT IN THE INPUT WORKSPACE arcpy.env.workspace = in_workspace # COLLECT FEATURE CLASSES FROM WORKSPACE TREE FCs = [] walk = arcpy.da.Walk(in_workspace, datatype='FeatureClass') for dirpath, dirnames, filenames in walk: for filename in filenames: FCs.append(os.path.join(dirpath, filename)) # COLLECT TABLES FROM WORKSPACE TREE tables = [] walk = arcpy.da.Walk(in_workspace, datatype='Table') for dirpath, dirnames, filenames in walk: for filename in filenames: tables.append(os.path.join(dirpath, filename)) # CREATE OUTPUT TABLE out_table = arcpy.CreateTable_management (out_workspace, out_table_name) # CREATE THE COLUMN HEADINGS LIST # 0 1 2 3 4 5 6 7 8 tableheaders = [w_name, d_type, tab_name, col_name, col_type, col_width, col_prec, col_alias, col_join] # IF USER SPECIFIED INCLUDING COLUMN COUNT STATISTICS, ADD THOSE COLUMN HEADINGS if (str(incl_counts).lower() == "true"): doCounts = True tableheaders.append(col_count) # 9 tableheaders.append(col_pc) # 10 tableheaders.append(contains_zeros) # 11 tableheaders.append(max_len) # 12 tableheaders.append(min_val) # 13 tableheaders.append(max_val) # 14 tableheaders.append(min_date_col) # 15 tableheaders.append(max_date_col) # 16 else: doCounts = False # IF USER SPECIFIED INCLUDING DOMAIN STATISTICS, ADD THOSE COLUMN HEADINGS if (str(incl_domains).lower() == "true"): doDomains = True tableheaders.append(dom_flag) # 17 tableheaders.append(col_domain) # 18 tableheaders.append(dom_col_type) # 19 tableheaders.append(dom_typ_col) # 20 else: doDomains = False # ADD THE COLUMN HEADINGS TO THE OUTPUT TABLE add_fields(out_table, tableheaders) # GET THE DOMAIN INFO FROM THE GEODATABASE # (IF THIS IS NOT A GEODATABASE, THIS FUNCTION WILL RETURN NONE) if doDomains == True: dom_dict = dict_domainname_domaincolumntype(in_workspace) else: dom_dict = "" arcpy.AddMessage(" Writing workspace schema to file...please wait...") # LOOP THROUGH FEATURE CLASSES, WRITING SCHEMA TO FILE for x in FCs: try: arcpy.env.workspace = in_workspace desc = arcpy.Describe(x) x_path = desc.catalogPath rows1 = table_schema_to_file(x_path, doDomains, doCounts, dom_dict) arcpy.AddMessage(" Inserting " + str(len(rows1)) + " rows for FC " + str(x) + "...") insert_rows(out_table, tableheaders, rows1) del rows1 except: tb = sys.exc_info()[2] tbinfo = traceback.format_tb(tb)[0] pymsg = tbinfo + "\n" + str(sys.exc_type)+ ": " + str(sys.exc_value) arcpy.AddError("Python Messages: " + pymsg + " GP Messages: " + arcpy.GetMessages(2)) continue # LOOP THROUGH TABLES, WRITING SCHEMA TO FILE for x in tables: try: rows2 = [] arcpy.env.workspace = in_workspace desc = arcpy.Describe(x) x_path = desc.catalogPath rows2 = table_schema_to_file(x_path, doDomains, doCounts, dom_dict) arcpy.AddMessage(" Inserting " + str(len(rows2)) + " rows for Table " + str(x) + " ...") insert_rows(out_table, tableheaders, rows2) del rows2 except: arcpy.AddMessage("ERROR: " + str(x)) arcpy.AddMessage("ERROR: " + str(rows2)) tb = sys.exc_info()[2] tbinfo = traceback.format_tb(tb)[0] pymsg = tbinfo + "\n" + str(sys.exc_type)+ ": " + str(sys.exc_value) arcpy.AddError("Python Messages: " + pymsg + " GP Messages: " + arcpy.GetMessages(2)) continue # OUTPUT PARAMETER arcpy.SetParameter(6, out_table) # TABLEVIEW arcpy.AddMessage(" ") except: tb = sys.exc_info()[2] tbinfo = traceback.format_tb(tb)[0] pymsg = tbinfo + "\n" + str(sys.exc_type)+ ": " + str(sys.exc_value) arcpy.AddError("Python Messages: " + pymsg + " GP Messages: " + arcpy.GetMessages(2)) finally: del in_workspace, incl_domains, incl_counts, out_workspace, out_table_name, prefix_chosen del out_table del tableheaders, dom_dict del tables, FCs
Tags
Workspace, Dictionary, InsertCursor, ListFields, Enumerate, SearchCursor, Excel, Domain, Walk, Describe, CreateTable, Schema, Table
Credits
Ruth Bowers, 2015, 2016
Use limitations
Tested in ArcGIS
10.2 as ArcGIS custom toolbox script tool. Tested only using all
parameters including optional parameters. May not function properly without including column stats and domain data.
No comments:
Post a Comment