Recently I wanted to know the average length of the data in a column. I was looking for something like.
SELECT AVG(LEN(column_name)) FROM TABLE_NAME
However, the results I received were not what I was expecting. This was due to white spaces, more precisely, leading and trailing white spaces in the data. Since I was not interested in the leading/trailing white spaces, I simply used the TRIM functions to trim the data before calculating the length of the data. Here is what I came up with:
SELECT AVG(LEN(LTRIM(RTRIM(column_name)))) FROM TABLE_NAME
This worked perfectly fine for me. However, while I was searching for the solution to my problem, I found out that there are actually two functions to calculate the length, LEN() and DATALENGTH() and both appear to do the same thing, i.e., calculate the length of a single record stored in a column.
Despite their similarities, there are subtle differences between the two. To explain I created a set of few SQLs statements as follows that render different results.
SELECT LEN(LTRIM(RTRIM(' 1234 ')))
SELECT LEN(' 1234 ')
SELECT DATALENGTH(' 1234 ')
Executing these SQLs will result in 4, 8 and 12 respectively. This is because in the first one we are trimming down any leading or trailing white spaces which leaves only 4 characters.
The last two are showing behaviour of the functions on a dataset with leading and trailing white spaces. The LEN() function treats the leading white spaces as valid data and discards the trailing white spaces before calculating the length of the dataset. As a result we are left with 4 white spaces and then the characters 1234, which results in a overall length of 8.
The DATALENGTH() function takes all the characters in the dataset, even if they are white spaces, whether trailing or leading. As a result you have 4 leading white spaces, 4 trailing white spaces, and 1234 characters in the middle, that makes up a total of 12 characters length for this dataset.
Both of these functions will render same results, i.e., 4, for the following SQLs.
SELECT DATALENGTH(LTRIM(RTRIM(' 1234 ')))
SELECT LEN(LTRIM(' 1234 '))
Note that we are using only LTRIM() function when using LEN() to calculate the data length. This is because LEN() removes trailing white spaces by default anyways.
In my opinion, I would always use DATALENGTH() function whether the white spaces have any significance or not. Depending on the significance of white spaces, I can always combine the DATALENGTH() method with LTRIM() and RTIRM() to get the desired results. At least in this way I would be in control of what I am trying to achieve.
No comments:
Post a Comment