copy

Thursday, 18 July 2013



A Custom Percentile VBA Function

Introduction

The definition of percentile according to Wikipedia is the following: “A percentile (or a centile) is a measure used in statistics indicating the value below which a given percentage of observations in a group of observations fall. For example, the 20th percentile is the value (or score) below which 20 percent of the observations may be found. The term percentile and the related term percentile rank are often used in the reporting of scores from norm-referenced tests. For example, if a score is in the 86th percentile, it is higher than 86% of the other scores”.

This week I had to find the 95th percentile from various experimental datasets (velocity profiles).  Since I haven’t done something similar in the past, my initial thought was to use one of the three built-in functions of Excel (2010): PERCENTILE.EXC, PERCENTILE.INC, and PERCENTILE. Note that the last one is available in Excel 2010 for backward compatibility with older Excel versions, and it is quite similar with PERCENTILE.INC function.


Problem description

Although I tried all of them I couldn’t get the desired result. The reason was that all of these functions use interpolation to find the k-th percentile whenever k is not a multiple of 1/(N - 1), where N is the number of ordered values in the input array. Below there is a short description of these functions, as well as a short reference to the interpolation issue based on Excel help:

PERCENTILE.EXC(array,k): Returns the k-th percentile of values in a range, where k is in the range 0..1, exclusive. If k is not a multiple of 1/(N - 1), PERCENTILE.EXC interpolates to determine the value at the k-th percentile.

PERCENTILE.INC(array,k):
Returns the k-th percentile of values in a range, where k is in the range 0..1, inclusive. You can use this function to establish a threshold of acceptance. For example, you can decide to examine candidates who score above the 90th percentile. If k is not a multiple of 1/(N - 1), PERCENTILE.INC interpolates to determine the value at the k-th percentile.

PERCENTILE(array,k): Returns the k-th percentile of values in a range. You can use this function to establish a threshold of acceptance. For example, you can decide to examine candidates who score above the 90th percentile. If k is not a multiple of 1/(N - 1), PERCENTILE interpolates to determine the value at the k-th percentile.

Let’s see an example: Assume that we have 5 values – 1, 2, 3, 4 and 5 – in the range A1:A5. If we calculate the 45th percentile the results using the built-in functions will be the following:
PERCENTILE.EXC(A1:A5;0.45) = 2.7
PERCENTILE.INC(A1:A5;0.45) = 2.8
PERCENTILE(A1:A5;0.45) = 2.8

The problem is that both values – 2.7 and 2.8 – are not included in the original array/range (1 to 5), due to the interpolation issue. So, for this case I needed a function that for a given percentile will return a value from the original array/range. In this specific example the desired function should return the value of 3. For the same reason if the original range was the 5, 6 60, 66, 100, then the results would be 43.8, 49.2 and 49.2 from PERCENTILE.EXC, PERCENTILE.INC and PERCENTILE respectively. However, in that case the desired 45th percentile should be 60, which is quite different from 43.8 and 49.2.


Solution

Since all the built-in functions of Excel use interpolation to find the exact percentile value, I decided to write a custom VBA function in order to bypass this issue. The “CustomPercentile” function is based on the following formula:

Custom Percentile Formula

Where:
n: closest rank
P: percentile
N: ordered values

The P-th percentile (0 <= P <= 100) of N ordered values (arranged from least to greatest) is obtained by first calculating the (ordinal) rank n, rounding the result to the nearest integer, and then taking the value that corresponds to that rank. (Note that the rounded value of n is just the least integer which exceeds N x P/100). So, if we have 30 ordered values, then the 75th percentile will be the = round ((75/100) x 30 + 0.5; 0) = 23rd value.

In the initial example (1, 2, 3, 4, 5) the 45th percentile using this formula will give: n = round ((45/100) x 5 + 0.5; 0) = 3. So, the 45th percentile will be the third value in the ordered values, which in the specific example is the value of 3. Similarly, in the second example (5, 6 60, 66, 100) the 45th percentile will be again the third value, which is 60.


VBA code

Here is the code for “CustomPercentile” function. The function first shorts the (numeric) range from the smallest to the 'largest using a bubble sort function and then calculates the desired percentile value:

Option Explicit
Option Base 1

Function CustomPercentile(InputRange As Range, Percentile As Double) As Variant
    
    '-----------------------------------------------------------------
    'The function calculates the desired percentile of a given range.
    'First it shorts the (numeric) range from the smallest to the
    'largest using BubbleSort function and then calculates the value.
    
    'By Christos Samaras
    'Date: 15/7/2013
    'xristos.samaras@gmail.com
    'http://www.myengineeringworld.net
    '-----------------------------------------------------------------
        
    'Declararing the necessary variables.
    Dim Data() As Double
    Dim i As Long
    Dim j As Long
    Dim NearestRank As Long
               
    'Check if the input range is valid.
    If IsRange(InputRange) = False Then
        CustomPercentile = "Input Rnage is empty or incorrect!"
        Exit Function
    End If
    
    'Check if the input percentile is within the valid range.
    If Percentile < 0 Or Percentile > 100 Then
        CustomPercentile = "Percentile must be between 0 and 100!"
        Exit Function
    End If
    
    'If the input range is has m rows and n columns then the array
    'will have m x n rows and 1 column.
    ReDim Data(InputRange.Rows.Count * InputRange.Columns.Count)
    
    'Check if the input range has numeric values and then create a
    '1-column array with the input range values.
    For i = 1 To InputRange.Rows.Count
        For j = 1 To InputRange.Columns.Count
            If IsNumeric(InputRange.Cells(i, j)) = True Then
                Data(i * j) = InputRange.Cells(i, j)
            Else
                CustomPercentile = "Numeric error in cell " & InputRange.Cells(i, j).Address & "."
                Exit Function
            End If
        Next j
    Next i
            
    'Sort the array from the lowest to the highest value.
    Call BubbleSort(Data)
    
    'Find the nearest rank: n = P/100 * N + 1/2 where:
    'n is the (ordinal) rank, rounding the result to the nearest integer
    'P is the P-th percentile
    'N is the number of ordered values (arranged from least to greatest)
    NearestRank = Round(Percentile / 100 * (InputRange.Rows.Count * InputRange.Columns.Count) + 0.5, 0)
    
    'Return the percentile's value.
    Select Case Percentile
        Case 0: CustomPercentile = Data(1)
        Case 100: CustomPercentile = Data(NearestRank - 1)
        Case Else: CustomPercentile = Data(NearestRank)
    End Select

End Function

Function IsRange(InputRange As Range) As Boolean

    '---------------------------------------------
    'Checks if the input range is a valid range.
    
    'By Christos Samaras
    'Date: 15/7/2013
    'xristos.samaras@gmail.com
    'http://www.myengineeringworld.net
    '---------------------------------------------
    
    IsRange = False
    
    'If the given range intersects the sheet's range, then the range is valid.
    On Error Resume Next
    If Not Application.Intersect(InputRange, Range("A1:XFD1048576")) Is Nothing Then
        IsRange = True
    End If

End Function

Function BubbleSort(ArrayToShort() As Double) As Variant
    
    '------------------------------------------------------------
    'Shorts a given array from the lowest to the highest value.
    'Here it is used for numerical values only.
    
    'By Christos Samaras
    'Date: 15/7/2013
    'xristos.samaras@gmail.com
    'http://www.myengineeringworld.net
    '------------------------------------------------------------
    
    Dim i As Long
    Dim j As Long
    Dim TemporaryValue As Variant

    'Loop through each element in the array and if the current array element
    'is greater than the elelment next to it, switch the two elements.
    For i = LBound(ArrayToShort) To UBound(ArrayToShort) - 1
        For j = i + 1 To UBound(ArrayToShort)
            If ArrayToShort(i) > ArrayToShort(j) Then
                TemporaryValue = ArrayToShort(j)
                ArrayToShort(j) = ArrayToShort(i)
                ArrayToShort(i) = TemporaryValue
            End If
        Next j
    Next i
    
End Function


How to use the function

The function can be used like any Excel built-in function, following the steps below:
  1. Open a workbook or create a new one.
  2. Press ALT + F11 to switch to VBA editor.
  3. From the File menu select import file. Alternatively press CTRL + M.
  4. In the pop up window browse to your hard disk and find the mCustomPercentile.bas (module file). Select the file and press the open button.
  5. That was the difficult part! Now close the VBA editor and return to your workbook.
  6. Select an empty cell and type: =CustomPercentile(your range; your percentile). For example in cell B5 you can type: =CustomPercentile(A1:K1; 65). The function will return the 65th percentile from the values in the range A1:K1.
  7. Finally, if you want to keep the function at your workbook save the workbook as Excel Macro-Enabled workbook or as Excel 97-2003 workbook.


Download it from here


The zip file contains a sample Excel workbook, as well as a VBA module, which you can import to an existing workbook following the instructions above.

Did you like this post? If yes, then share it with your friends. Thank you!


Categories:


Mechanical Engineer (Ph.D. cand.), M.Sc. Cranfield University, Dipl.-Ing. Aristotle University, Thessaloniki - Greece.
Communication: e-mail, Facebook, Twitter, Google+ and Linkedin. More info