This audio was created using Microsoft Azure Speech Services
The last installment of this three part series deals with data normalization and the alignment process of taking one piece of data and loading it to a destination. This topic, along with our previous conversations on network connectivity and data transfer technologies, rounds out our series on the Challenges of Data Consolidation.
Data normalization in this case involves aligning two different data sources in areas of naming, units and scaling. Most of this alignment is taken into account with what is typically referred to as Extract Transform Load or ETL process. ETL is taking data from one source, mapping or modifying the data and then loading it to a destination. Effective planning for mapping and modifying the data is key to a successful integration.
The end goal is to format the data to accommodate the visualization system in use or any preferences on presenting the data. The planning must include a good understanding of any unit conversions. These might include simple scaling such as watts to kilowatts or calculations such as converting BTU to kilowatts. In some cases, data in one system do not align well with one another due to a missing value. If a system has the parameters to calculate that value, the ETL transform process could perform the calculation and align the results. Proper name mapping is something best done in a spreadsheet initially to make sure you have all the data identified with a source and destination name.
Some software platforms such as a BMS or DCIM have this ETL capability natively. If it is native, the software is typically configured with a target IP and an authentication parameter. This defines the connection to the standard data exchange of the other system. Native support sounds ample, but typically leaves out critical parameters due to customization or data that does not translate easily. Without effective native support, a quality data integration tool is required with some expert consulting that understands details about how each platform stores and shares data.
In summary, effective planning for mapping and modifying the data is key to a successful integration. The end goal is to format the data to accommodate the visualization system in use or any preferences on presenting the data.
This wraps my three part series on data consolidation. Remember, data consolidation has to be considered as a project and not just a simple task. The planning and break down of tasks will help you understand the scope of work. There are many decisions to make throughout the process of data consolidation, and understanding if this is a feat you want to tackle internally or with the help of a vendor is really the first question that needs to be answered.
Jason is a part of Schneider Electric’s Data Center Software Solutions Team. More information on our software solutions can be found here. Or visit Schneider Electric DCIM support.
Conversation
Reuben Khunou
9 years ago
Interesting article, my two cents worth , is that shouldn’t the first phase of planning as critical as it is , verify and validate data intergrity before the second phase as indicated …it has been noticed over time that managers needs to take advantage to clean up data inconsistencies from the beginning and before analysing , interpreting and using data for strategic and comparative advantage for their respective organisations….
Jason Thurmond
9 years ago
Reuben,
Yes, you are absolutely right. Data validation is a critical step. It should be thought of during planning. We typically focus on data validation once it arrives in the consolidated database. This is assuming that you did take some steps to clean the data during the translation steps. After data is normalize in the larger database, you can clean the data looking for “out of bounds” conditions or gaps in the data. This would need to be address with some basic rules like replacing the dirty data with averages of surrounding data but marking it as modified. I have seen where we try to fully clean data during the translation steps from various sources but just end up repeating it when data arrives at the destination.
Thanks for the very good point.
Jason