When working with data, it is important to understand the different data types and how to store and use them. This blog post is written for a data novice and will cover some of the basic data types, their uses and sizes.

 

Bytes:

A byte is a unit of data that is made up of 8 bits. Bytes are written in binary. Because they are written in binary and have little encoding required, they are one of the smallest data types.

 

Numbers: (Int, Float, double)

A number is a number and that’s that right? Not when you’re talking to a database. Believe it or not, there are several types of numbers when you’re talking data types.

  • Int: short for ‘integer’, this is a whole number (it has no decimal places). Depending on the program you are using, this story might end here, however some programs want you to define exactly how big your integers are going to be. If you’re recording the ages of people (in whole years), you know that your integers aren’t going to get larger than 3 digits so you are probably safe to use int16 (until we crack the code of living forever, but even then, you could go up to 32,767 years old with int16.)
  • Float: Float is the data term used for decimal numbers.
  • Double: A double, is simply a float that can hold double the amount of detail (decimal places).

 

Strings:

String is short for string of characters. To an absolute data novice, if you are thinking of words, then these will be written as a string in your data base. There are various types of ways to store strings, including wvString and VarChar. It is important to note that although we must store words as strings, we can also store numbers as strings. If a number is stored as a string, then you cannot do any mathematical calculation on it. You may come across an error where you want to get the total of a column of numbers and cannot understand why the program you are working in will not allow you to do so – this may be due to the fact that you erroneously stored this column as a string data type, rather than int or float.

 

Date and time:

The date/time related data type; including Date, Time and DateTime is a special data type. This is because, depending on the program, it is often a number coerced into a date type.

 

Boolean:

This data type might be slightly less familiar to a data novice, but the type is easy to grasp. Unlike the answer to whether or not Schrödinger’s cat is dead; there can only be two possible options for a boolean – true or false, 1 or 0.

 

Null:

NULL in a database is a lack of value. It is important to note that NULL and blanks or empty strings are not the same. Though nothing can be done with a Null value, checks can be made for Null values and appropriate calculations made accordingly. More on Nulls below.

 

Objects:

Depending on the language or program where your data is being stored, Objects and Arrays may also be another way to store your data. In simple terms, an Array is a list. It could be a list of strings, a list of integers, a list of doubles etc. or any combination of these or more. An array will hold information that belongs together. Similarly, a data object is a list of key-value pairs. For example: { FirstName: “John”, LastName: “Doe”, Age: 40, Height: “180cm”}.
Objects are more often used in programming and utilised in JSON (JavaScript Object Notation) which is commonly used when uploading data to web-based applications or accessing data from APIs.

 

Blobs:

Blobs (Binary Large Objects) are a special type of object. As the name suggests, they are objects written in binary. Because they are a lightweight, efficient way to store large/complex objects Blobs are often used to store media such as images, sound files or videos. You can read more about this object type and considerations for storing them in this article.

 

Data types, sizes and descriptions

Type

Data Size

Size in Character Length

Description

Byte

1 byte (8 bits)

0 – 255 (numbers only)

Only positive whole numbers between 0 & 255 can be stored as bytes in Alteryx

Int16

2 bytes

–32,768 to 32,767

Only whole numbers between –32,768 and 32,767

Int32

4 bytes

–2,147,483,648 to 2,147,483,647

Only whole numbers between –2,147,483,648 and 2,147,483,647

Int64

8 bytes

–9,223,372,036,854,775,808 to 9,223,372,036,854,775,807

Only whole numbers between –9,223,372,036,854,775,808 and 9,223,372,036,854,775,807

Float

4 bytes

+/- 3.4 x 10-38 to 3.4 x 1038 with 7 digits precision

Short for “floating point value” a float is simply a decimal number.

Double

8 bytes

+/- 1.7 x 10-308 to 1.7 x 10308 with 15 digits

A double is short for “double precision floating point value” and stores numbers twice the length of a float and therefore also takes up twice the memory of a float.

BLOB

Up to several hundred gigabytes

Stored in bytes. Length can vary

Usually used to store media files such as images, video or sound.

For more information on data types and sizes, read this Alteryx page.

 

Null != empty != 0 and when to use Null vs 0?

0 is a number value that could be expressed as an int, float or double. Because it is a number, mathematical equations can be applied to this value. It will take up the amount of space it has been allocated in the database (depending on if this was an int, float or double).
NULL on the other hand, is a lack of value. Nothing can be done “with” a NULL value however checks can be made to “look for” a NULL value. We can filter or check for NULLs and manipulate data in rows with NULLs or the NULL value itself for example.
Blank or Empty cells in a table are usually made up of an empty string. This may occur when a user tabs through an online form or leaves spaces, but no typed answer.

 

Should you convert NULL values to 0?

Though it may be tempting to convert all NULL values in a column full of numbers to a 0, it is important to think of the implications of doing this. If you later need to calculate an average using this column, the presence of zeros that are not truly zeros will bring down your average. You may also want to filter out rows with null in a particular column. For this reason, it is generally best to leave NULLs as they are.

 

I hope this article has helped you to understand the different data types, and that you can now continue on your data analyst journey armed with this knowledge.

 

 

Emma Wishart
Author: Emma Wishart