Identify Data Format

Data is a collection of facts such as numbers, descriptions, and observations used to record information.

You can classify data as structuredsemi-structured, or unstructured.

Structured data

Structured data is data that adheres to a fixed schema, so all of the data has the same fields or properties. Most commonly, the schema for structured data entities is tabular

Semi-structured data

Semi-structured data is information that has some structure, but which allows for some variation between entity instances. For example, while most customers may have an email address, some might have multiple email addresses, and some might have none at all.

// Customer 1

{

  "firstName": "Joe",

  "lastName": "Jones",

  "address":

  {

    "streetAddress": "1 Main St.",

    "city": "New York",

    "state": "NY",

    "postalCode": "10099"

  },

  "contact":

  [

    {

      "type": "home",

      "number": "555 123-1234"

    },

    {

      "type": "email",

      "address": "joe@litware.com"

    }

  ]

}


// Customer 2

{

  "firstName": "Samir",

  "lastName": "Nadoy",

  "address":

  {

    "streetAddress": "123 Elm Pl.",

    "unit": "500",

    "city": "Seattle",

    "state": "WA",

    "postalCode": "98999"

  },

  "contact":

  [

    {

      "type": "email",

      "address": "samir@northwind.com"

    }

  ]

}

Unstructured data

Not all data is structured or even semi-structured. For example, documents, images, audio and video data, and binary files might not have a specific structure. This kind of data is referred to as unstructured data.

Data stores

Organizations typically store data in structured, semi-structured, or unstructured format to record details of entities (for example, customers and products), specific events (such as sales transactions), or other information in documents, images, and other formats. The stored data can then be retrieved for analysis and reporting later.

There are two broad categories of data store in common use:

  • File stores
  • Databases