Invalid input data causing your software to crash midway through its execution is costly in terms of processing power and resolution time.
Avoid wasting resources by not processing data known to be invalid. Provide clear feedback on errors in the input data to facilitate quick resolution.
When processing data, validate as much of the input as possible before processing. This typically involves checking whether:
If any of these checks fail, halt processing and provide feedback on what is wrong with the input data. Feedback can range from a simple message to the user, a detailed error log for internal use, or a notification event sent to a logging and auditing system.
{{<tip text=` In cases where confidentiality or security is a concern, consider providing a generic error message to the user while recording a detailed error log for internal use. Example:
An error occurred while processing your request.
We apologize for the inconvenience. Our development team has been notified and will take corrective action.
If you want to be notified when the issue is resolved,
please contact our support team and provide the following reference number: 1234567890
` >}}
The following factors support effective application of the practice:
The following factors prevent effective application of the practice:
While the approach brings numerous benefits in terms of efficiency and reliability, it can also lead to several unexpected or undesired outcomes:
To mitigate the potential negative consequences of the approach:
Say you are writing code to process a data file containing information about characters in a fictional universe.
Your data supplier tells you the file will have the following fields: allegiance
, homeworld
, species
, and name
.
The goal is to use this input data and convert it to a Markdown table to be used inside a publication of sorts.
You are given the file star_wars_characters_example.csv
as a reference input file.
name,height,mass,hair_color,skin_color,eye_color,birth_year,gender,homeworld,species,allegiance
Luke Skywalker,172,77,blond,fair,blue,19BBY,male,Tatooine,Human,Light Side
C-3PO,167,75,NA,gold,yellow,112BBY,NA,Tatooine,Droid,Light Side
R2-D2,96,32,NA,"white, blue",red,33BBY,NA,Naboo,Droid,Light Side
Darth Vader,202,136,none,white,yellow,41.9BBY,male,Tatooine,Human,Dark Side
Passing this input file to your script should result in the following Markdown table:
| allegiance | homeworld | species | name |
|------------|-----------|---------|----------------|
| Light Side | Tatooine | Human | Luke Skywalker |
| Light Side | Tatooine | Droid | C-3PO |
| Light Side | Naboo | Droid | R2-D2 |
| Dark Side | Tatooine | Human | Darth Vader |
You can write a simple Python script that is able to process the file and generate the markdown output table. The script below reads the input file and processes it line by line, checking if all the required fields are present in the input file. When a field is missing, the script will print an error message and exit the script.
import csv
import sys
from enum import Enum
class HeaderFields(Enum):
ALLEGIANCE = 'allegiance'
HOME_WORLD = 'homeworld'
SPECIES = 'species'
NAME = 'name'
def process_file(file_to_process):
with open(file_to_process, newline='') as file:
print('|', ' | '.join([field.value for field in HeaderFields]), end=' |\n', file=sys.stdout, flush=False)
print('|', ' | '.join(['---' for _ in HeaderFields]), end=' |\n', file=sys.stdout, flush=False)
data = csv.DictReader(file)
for i, row in enumerate(data):
output = []
for field in HeaderFields:
if field.value not in row:
print('Error: Missing field [', field.value, '] in input file at line: ', i + 1)
sys.exit(1)
output.append(row[field.value])
print('|', ' | '.join(output), end='|\n', file=sys.stdout, flush=False)
def main():
if len(sys.argv) < 2:
print("Usage: python dataLoader.py <filename>")
sys.exit(1)
input_file = sys.argv[1]
process_file(file_to_process=input_file)
if __name__ == "__main__":
main()
Your script works as expected, but when you receive the real data file from the supplier, the output of your code is:
| allegiance | homeworld | species | name |
|------------|-----------|---------|------|
Error: Missing field [ allegiance ] in input file at line: 1
Given the error message, you can see that the allegiance
field is missing in the input file.
The script stops processing the file and exits, so you have to change your code or data file to match the expected structure.
Imagine you have a large amount of data files with many lines, and you need to check each of them for missing fields. For this example, the amount of fields is small, but in a real-world scenario, you could have dozens (if not hundreds) of fields to check. Running the program for each file, checking the output, and making the required change is a tedious proposition.
Rather than processing the file line by line, until we encounter an error, you can read the first line of the file and check if all the required fields are present.
The main
function is updated to first validate the input file, and then process the file only if the validation is successful.
def main():
if len(sys.argv) < 2:
print("Usage: python dataLoader.py <filename>")
sys.exit(1)
input_file = sys.argv[1]
is_valid = validate_file(file_to_check=input_file)
if not is_valid:
print('Error: Invalid input file, SKIPPING [', input_file, ']')
else:
process_file(file_to_process=input_file)
Our new validate_file
function reads the first line of the file and checks if all the required fields are present.
Furthermore, this new version of the function will give you detailed feedback on which fields are missing in the provided input file.
def validate_file(file_to_check):
is_valid = True
with open(file_to_check, newline='') as file:
header = next(csv.reader(file))
for field in HeaderFields:
if field.value not in header:
print('Error: Missing field [', field.value, '] in input file [', file_to_check, ']')
is_valid = False
return is_valid
If we run the script with the real data file, the output will be:
Error: Missing field [ allegiance ] in input file [ star_wars_hair_dressers_database.csv ]
Error: Missing field [ homeworld ] in input file [ star_wars_hair_dressers_database.csv ]
Error: Missing field [ species ] in input file [ star_wars_hair_dressers_database.csv ]
Error: Invalid input file, SKIPPING [ star_wars_hair_dressers_database.csv ]
Taking a look at the error message, you can see that the allegiance
, homeworld
, and species
fields are missing in one of the input files.
This is suspicious as the supplier told you that all files would have the same structure. Taking a closer look at the erroneous file, you can see
that the supplier made a mistake, and sent you the data intended for the ‘Galactic Hairdressers Data Convention’.
name,hair_color
Luke Skywalker,blond
C-3PO,NA
R2-D2,NA
Darth Vader,none
You can now contact the supplier and ask for the correct file.
{{<tip text=Failing fast with full feedback allows you to save valuable system resources and troubleshoot the issue quickly. In real-world scenarios, you could have hundreds of files to check, and the processing of the data is likely to be computationally expensive.
>}}