Mapping Unstructured Data: How-to
When creating an ODBC driver, it is essential to convert data from the data source into a relational format. This is necessary because ODBC and SQL require and operate on data that is organized with a fixed schema and returned in a tabular structure. By utilizing the SimbaEngine SDK, you can efficiently map your data into a relational format, enabling you to build an ODBC driver compatible with various application vendors, including Power BI, Tableau, Qlik Sense, Alteryx, Informatica, MicroStrategy, Logi Symphony, Excel, and more. This post will explore a few scenarios and demonstrate how each can be mapped into a relational format.
A straightforward example is data stored in CSV files. In this scenario, let’s assume that each row in the CSV file has the same number of columns. An example of such data might look like this:
Row 1,3.450,Data Value 1,1999-09-08
Row 2,6.789,Data Value 2,2000-01-23
In this example, it’s easy to map to a relational model because the data is already represented relationally. The only difficulty is determining the metadata for the columns in the CSV. You could do a few things: map the columns to string, create another CSV file specifically for holding metadata about the other CSV files, create another file (such as XML) that is associated with each CSV file and contains the metadata about that file, or put in a header line containing the metadata to name a few. The only important thing is that you can create code which exposes this data and metadata.
A more complex example involves using a JSON data source. There are several ways to map this type of data. Consider the following (contrived) data sample:
{
"order": {
"id": "1",
"value": "customer1",
"receipt": {
"items": [
{
"value": "item1",
"price": "5.55"
},
{
"value": "item2",
"price": "4.22"
}
]
}
},
"order": {
"id": "2",
"value": "customer2",
"receipt": {
"items": [
{
"value": "item3",
"price": "3.45"
}
]
}
}
}
Notice that in the second JSON document, there is one element in the “items” array, while there are two in the first. To map this data to a relational model, there are two different approaches. In both cases, it is essential to know the complete set of fields present in all the JSON documents. This can be achieved by having the user enter them manually, have the driver sample the available documents (perhaps scanning through the first N documents), or by using a combination of these methods.
Once you have identified the complete set of fields, you can begin mapping this data to a relational model. A straightforward approach is to flatten the data into columns, with each element in the array becoming a separate column. This way, the data would be represented as:
id
|
value
|
receipt.items[0].value
|
receipt.items[0].price
|
receipt.items[1].value
|
receipt.items[0].price
|
1
|
customer1
|
item1
|
5.55
|
item2
|
4.22
|
2
|
customer2
|
item3
|
3.45
|
|
|
In the second row, there is no second element in the array, so NULL values are used instead when returning the relational data. Similarly, if a field appears in one document but not in another, NULL will be substituted for the missing field. While there are other approaches for mapping this data, we will leave those as an exercise for the reader.
A third and final example would be taking a key-value store and mapping that to a relational model. There are many different types of key-value stores, with many different subtleties, and it would make for a very long post if we were to consider all of them. Instead, let’s imagine a hypothetical key-value store which follows the example below:
key1
value1
value2
child2
key2
value3
In this case we have two “master” keys (key1 and key2), where key1 has two values (value1 and value2), and value2 also has a child (child2). key2 only has one value (value3) with no other children. A simple mapping of this would be to treat the keys as columns, their immediate values as columns, and any subsequent children as columns as well. Applying this to the above example would then result in the following mapping:
key
|
firstLevel
|
secondLevel
|
key1
|
value1
|
|
key1
|
value2
|
child2
|
key2
|
value3
|
|
Note that the column names have been synthesized and the types are not specified for this example. It would be easy to adapt the strategies employed in the previous example, which is also left as an exercise for the reader.
There are many other ways to represent this data relationally and numerous other data models that can be mapped. Over the years, we have encountered a variety of data types and successfully mapped all of them into sensible relational representations. Once this mapping is complete, it takes just five days to have your read-only ODBC driver up and running, connecting to the application vendor of your choice!
Get a Demo
Simba Data Connectors provide trusted access to data anywhere, relentlessly optimized for performance and functionality.
What to expect
20-minute discovery call with a product expert
Discover which solutions are best suited for your needs
Live demo tailored to your business requirements
No high pressure sales pitch
"*" indicates required fields
Hidden
Interested in Financial Reporting
Interested in Business Analytics and Dashboards
Interested in Data Warehousing/BI Cubes
Interested in Multiple Companies/Databases Consolidation
Interested in Report Sharing and/or Scheduling
Interested in Power BI
Hidden
How Can We Help?*
-- Select -- Sales Generic Pricing Demo Demo and Pricing Purchase Free Trial Free Trial Request Contact Partnership Request Account Review
Business Email*
First Name*
Last Name*
Phone Number*
Company Name*
What Best Describes You?
Prospective Customer
Current Customer
Partner
Hidden
Do you resell software?
Job Title
Primary Financial System
-- Select One -- Deltek Epicor Infor JD Edwards Microsoft MRI Software NetSuite Oracle Other Sage SAP SYSPRO Viewpoint
Financial System Version
-- Select One -- 24SevenOffice A+ AARO AccountEdge Accounting CS Accountmate Acumatica Alere Anaplan Aptean Assist ASW Aurora (Sys21) Axion Axis BAAN Banner Blackbaud BlueLink Book Works BPCS Cayenta CCH CDK Global CedAr e-financials CGI Advantage Clarus CMiC CMS (Solarsoft) Coda Coins Colleague CPSI CSC CorpTax Custom CYMA DAC Data Warehouse Datatel DATEV Davisware Global Edge Davisware S2K Deacom Deltek Ajera Deltek ComputerEase Deltek CostPoint Deltek Maconomy Deltek VantagePoint Deltek Vision Deltek Vision Cloud DPN Dynamics 365 Business Central Dynamics 365 Finance and Supply Chain Management Dynamics AX Dynamics CRM Dynamics GP Dynamics NAV Dynamics NAV C5 Dynamics SL e5 eCMS Eden (Tyler Tech) Emphasys Entrata Epicor Avante Epicor BisTrack Epicor CMS Epicor Enterprise Epicor Epicor SLS Epicor iScala Epicor Kinetic Epicor LumberTrack Epicor Manage 2000 Epicor Prophet 21 Epicor Tropos Etail Expandable FAMIS Famous Software Fern FinancialForce FireStream FIS FiServ Flexi Fortnox Foundation Fourth Shift Friedman Full Circle GEMS Harris Data (AS/400) HCS HMS IBM Cognos TM1 IBS IBS-DW In-House Developed Incode INFINIUM Infor CloudSuite Financials Infor Distribution SX.e Infor Financials & Supply Management Infor Lawson Infor M3 Infor System21 Infor SyteLine IQMS iSuite Jack Henry Jenzabar JobBOSS Jonas Construction M1 Macola MACPAC Made2Manage MAM MAM Autopart Manman Mapics McLeod MEDITECH MFG Pro MicrosOpera MIP Mitchell Humphrey Movex MRI MRI Commercial Management MRI Financials MRI Horizon MRI Horizon CRE MRI Qube Horizon MRI Residential Management MSGovern Munis (Tyler Tech) New World Systems Onesite Onestream XF Open Systems Oracle E-Business Suite (EBS) Oracle EPM Cloud Oracle ERP Cloud Oracle Essbase Oracle Financial Consolidation and Close (FCCS) Oracle Fusion Oracle Hyperion Enterprise Oracle Hyperion Financial Management (HFM) Oracle Hyperion Planning Oracle PeopleSoft Oracle Planning and Budgeting Cloud Service (PBCS) Oracle Tax Reporting PDI Penta Plexxis PowerOffice PRMS Pro Contractor ProLaw Q360 QAD Quantum Qube Horizon QuickBooks Desktop Premier QuickBooks Desktop Pro Quickbooks Enterprise QuickBooks Online Quorum RealPage REST API Retalix Ross Sage 100 Sage 100 Contractor Sage 200 Sage 300 Sage 300 CRE (Timberline) Sage 500 Sage 50cloud Accounting Sage AccPac Sage Adonix Tolas Sage Estimating Sage Intacct Sage Intacct Budgeting & Planning Sage MAS Sage X3 SAP BPC (HANA, MS or Netweaver) SAP Business ByDesign SAP Business One (B1) SAP Business Warehouse (BW) SAP ERP SAP ERP Central Component (ECC) SAP S/4HANA SmartStream Spokane Springbrook SQL Server Analysis Services (SSAS) Standalone DB with ODBC/DSN connection Standalone IBM DB Standalone Oracle DB Standalone SQL DB SUN Sunguard SunSystems Sys21 SyteLine TAM (Applied Systems) Thomson Reuters Tax Timberline TIMELINE Traverse TripleTex Unit4 Unit4 Agresso Unit4 Business World Unit4 Coda USL Financials Vadim VAI-System 2000 Vantage Vertex Viewpoint Spectrum Viewpoint Vista Visma Winshuttle Wolters Kluwer CCH Tagetik WorkDay Xero xLedger Xperia Yardi Yardi-SaaS
Hidden
Area of Interest
-- Select One -- Accounting & Treasury Automation & Data Management Budgeting & Planning Close & Consolidation Embedded Analytics Financial Reporting Operational Reporting & Analytics Tax & Compliance
Hidden
Current ERP Implementation Partner
Product of Interest
-- Select a Product -- Agility PIM Angles Angles Enterprise for Oracle - Reporting for Oracle Angles Enterprise for SAP Angles Professional Atlas BizNet Bizview Calumo Certent - Certent Disclosure Management - Certent DisclosureNet - Certent Equity Management Clausion Consolidation - Tabella Cubeware CXO Dundas BI Event 1 Software Exago Hubble IDL - IDL Cockpit - IDL Designer - IDL Forecast - IDL Konsis - IDL Publisher Izenda Jet - Jet Analytics - Jet Basics - Jet Reports Kalido Legerity Financials - FastPost Logi Composer Logi Info Logi Report Logi Symphony Longview - Longview Analytics - Longview Close - Longview Plan - Longview Tax - Longview Transfer Pricing Mekko Graphics Power ON Process Runner - Financial Optimization for SAP Process Runner GLSU Simba - Simba Drivers - Simba SDK SourceConnect Spreadsheet Server Tidemark Viareport - Viareport Consolidation - Viareport Lease Vizlib Wands - Wands for Oracle - Wands for SAP
BI Tool
Tableau Power BI Qlik Looker Domo AWS - QuickSight ThoughtSpot Custom Build Other
Country
-- Select -- Afghanistan Albania Algeria American Samoa Andorra Angola Antigua and Barbuda Argentina Armenia Australia Austria Azerbaijan Bahamas Bahrain Bangladesh Barbados Belarus Belgium Belize Benin Bermuda Bhutan Bolivia Bosnia and Herzegovina Botswana Brazil Brunei Bulgaria Burkina Faso Burundi Cambodia Cameroon Canada Cape Verde Cayman Islands Central African Republic Chad Chile China Colombia Comoros Congo, Democratic Republic of the Congo, Republic of the Costa Rica Croatia Cuba Curaçao Cyprus Czech Republic Denmark Djibouti Dominica Dominican Republic Ecuador Egypt El Salvador Equatorial Guinea Eritrea Estonia Ethiopia Faroe Islands Fiji Finland France French Polynesia Gabon Gambia Georgia Germany Ghana Greece Greenland Grenada Guam Guatemala Guinea Guinea-Bissau Guyana Haiti Honduras Hong Kong Hungary Iceland India Indonesia Iran Iraq Ireland Israel Italy Jamaica Japan Jordan Kazakhstan Kenya Kiribati Kosovo Kuwait Kyrgyzstan Laos Latvia Lebanon Lesotho Liberia Libya Liechtenstein Lithuania Luxembourg North Macedonia Madagascar Malawi Malaysia Maldives Mali Malta Marshall Islands Mauritania Mauritius Mexico Micronesia Moldova Monaco Mongolia Montenegro Morocco Mozambique Myanmar Namibia Nauru Nepal Netherlands New Zealand Nicaragua Niger Nigeria North Korea Northern Mariana Islands Norway Oman Pakistan Palau Palestine, State of Panama Papua New Guinea Paraguay Peru Philippines Poland Portugal Puerto Rico Qatar Romania Russia Rwanda Saint Kitts and Nevis Saint Lucia Saint Vincent and the Grenadines Samoa San Marino Sao Tome and Principe Saudi Arabia Senegal Serbia Seychelles Sierra Leone Singapore Sint Maarten Slovakia Slovenia Solomon Islands Somalia South Africa South Korea Spain Sri Lanka Sudan Sudan, South Suriname Swaziland Sweden Switzerland Syria Taiwan Tajikistan Tanzania Thailand Togo Tonga Trinidad and Tobago Tunisia Turkey Turkmenistan Tuvalu Uganda Ukraine United Arab Emirates United Kingdom United States Uruguay Uzbekistan Vanuatu Venezuela Vietnam Virgin Islands, British Virgin Islands, U.S. Yemen Zambia Zimbabwe
State/Province
-- Select State -- Alabama Alaska Arizona Arkansas California Colorado Connecticut Delaware District of Columbia Florida Georgia Hawaii Idaho Illinois Indiana Iowa Kansas Kentucky Louisiana Maine Maryland Massachusetts Michigan Minnesota Mississippi Missouri Montana Nebraska Nevada New Hampshire New Jersey New Mexico New York North Carolina North Dakota Ohio Oklahoma Oregon Pennsylvania Rhode Island South Carolina South Dakota Tennessee Texas Utah Vermont Virginia Washington West Virginia Wisconsin Wyoming -- Select Province -- Alberta British Columbia Manitoba New Brunswick Newfoundland and Labrador Nova Scotia Northwest Territories Nunavut Ontario Prince Edward Island Quebec Saskatchewan Yukon
I agree to receive digital communications from insightsoftware and its partners (
click here for full list of partners) containing news, product information, promotions, or event invitations. I understand that I can withdraw my consent at any time.
Privacy Policy.
Hidden
Google Ads Click ID
Hidden
Experiment ID & Variant
Hidden
dataLayer Form Type
Hidden
Is Google Experiment
Hidden
Salesforce Campaign
Hidden
Chili Piper Product Line
Hidden
Chili Piper Country
Hidden
Chili Piper ERP System
Hidden
Chili Piper ERP Version
Company Type
Hidden
Annual Revenue Bucket
Hidden
3rd Party Company Name
Hidden
3rd Party Industry
Hidden
3rd Party Sub Industry
Hidden
3rd Party Employee Range
Hidden
3rd Party Revenue Range
Hidden
3rd Party Street Address
Hidden
3rd Party Data Source
Hidden
3rd Party Primary SIC
Hidden
3rd Party Watch List Account Type
Hidden
3rd Party Watch List Account Status
Hidden
3rd Party Watch List Campaign Code
Hidden
3rd Party Watch List Account Owner
Hidden
3rd Party Primary NAICS
Comments
This field is for validation purposes and should be left unchanged.
Δ
The post Mapping Unstructured Data: How-to appeared first on insightsoftware.
80