copy

Sunday, 20 October 2013



Read And Write PDF Forms From Excel


Introduction



This post is based on a blog reader request. Some days ago Mac commented in an old post about a problem he had with a PDF form. He actually wanted to fill the fields of a PDF form using data from an XML file. To tell you the truth, I had never worked with PDF forms up to that time, so it was an excellent opportunity to learn something new. So, although this post doesn’t answer Mac’s request directly, it provides some ideas to anyone who wants to exchange data with PDF forms using Excel and Adobe Professional. Moreover, if you have data in an XML file, you can import the data in Excel and then run the macros provided here.

For demonstration purposes, I have created a sample conference PDF form (Test Form). Below you will find VBA code for writing and reading data from Test Form. The WritePDFForms macro uses the data in sheet Write to fill the Test Form, which is located in the same folder with the sample workbook. The data from each row is used to create a new PDF file, which is saved in the Forms subfolder. The First Name and Last Name column in each row are combined to create the name of the PDF file. On the other hand, the ReadPDFForms macro loops through all the files in the specified folder (Forms). If a PDF file is found, the macro opens it, reads specific fields from the file and writes the values in the sheet Read. The code of both macros can be easily adjusted to work with any PDF form, as long as the field names of the PDF form are known.



VBA code



Here is the VBA code for WritePDFForms macro.

Option Explicit

Sub WritePDFForms()
      
    '--------------------------------------------------------------------------------------
    'This macro uses the data in sheet Write in order to fill a sample PDF form named
    'Test Form, which is located in the same folder with this workbook. The data from
    'each row is used to create a new PDF file, which is saved in the Forms subfolder.
        
    'The code uses late binding, so no reference to external library is required.
    'However, the code works ONLY with Adobe Professional, so don't try to use it with
    'Adobe Reader because you will get an "ActiveX component can't create object" error.
    
    'Written by:    Christos Samaras
    'Date:          15/10/2013
    'e-mail:        xristos.samaras@gmail.com
    'site:          http://www.myengineeringworld.net
    '--------------------------------------------------------------------------------------

    'Declaring the necessary variables.
    Dim strPDFPath              As String
    Dim strFieldNames(1 To 11)  As String
    Dim i                       As Long
    Dim j                       As Integer
    Dim LastRow                 As Long
    Dim objAcroApp              As Object
    Dim objAcroAVDoc            As Object
    Dim objAcroPDDoc            As Object
    Dim objJSO                  As Object
    Dim strPDFOutPath           As String
           
    'Disable screen flickering.
    Application.ScreenUpdating = False
    
    'Specify the path of the sample PDF form.
    'Full path example:
    'strPDFPath = "C:\Users\Christos\Desktop\Test Form.pdf"
    'Using workbook path:
    strPDFPath = ThisWorkbook.Path & "\" & "Test Form.pdf"
    
    'Set the required field names in the PDF form.
    strFieldNames(1) = "First Name"
    strFieldNames(2) = "Last Name"
    strFieldNames(3) = "Street Address"
    strFieldNames(4) = "City"
    strFieldNames(5) = "State"
    strFieldNames(6) = "Zip Code"
    strFieldNames(7) = "Country"
    strFieldNames(8) = "E-mail"
    strFieldNames(9) = "Phone Number"
    strFieldNames(10) = "Type Of Registration"
    strFieldNames(11) = "Previous Attendee"
    
    'Find the last row of data in sheet Write.
    With shWrite
        .Activate
        LastRow = .Cells(.Rows.Count, "B").End(xlUp).Row
    End With
    
    'Loop through all rows of sheet Write and use the data to fill the PDF form.
    For i = 4 To LastRow
    
        On Error Resume Next
        
        'Initialize Acrobat by creating the App object.
        Set objAcroApp = CreateObject("AcroExch.App")
        
        'Check if the object was created.
        If Err.Number <> 0 Then
            MsgBox "Could not create the App object!", vbCritical, "Object error"
            'Release the object and exit.
            Set objAcroApp = Nothing
            Exit Sub
        End If
        
        'Create the AVDoc object.
        Set objAcroAVDoc = CreateObject("AcroExch.AVDoc")
        
        'Check if the object was created.
        If Err.Number <> 0 Then
            MsgBox "Could not create the AVDoc object!", vbCritical, "Object error"
            'Release the objects and exit.
            Set objAcroAVDoc = Nothing
            Set objAcroApp = Nothing
            Exit Sub
        End If
        
        On Error GoTo 0
        
        'Open the PDF file.
        If objAcroAVDoc.Open(strPDFPath, "") = True Then
            
            'Set the PDDoc object.
            Set objAcroPDDoc = objAcroAVDoc.GetPDDoc
           
            'Set the JS Object - Java Script Object.
            Set objJSO = objAcroPDDoc.GetJSObject
            
            On Error Resume Next
            
            'Fill the form fields.
            For j = 1 To 10
                
                objJSO.GetField(strFieldNames(j)).Value = CStr(shWrite.Cells(i, j + 1).Value)
                
                If Err.Number <> 0 Then
                    
                    'Close the form without saving the changes.
                    objAcroAVDoc.Close True
                    
                    'Close the Acrobat application.
                    objAcroApp.Exit
                    
                    'Inform the user about the error.
                    MsgBox "The field """ & strFieldNames(j) & """ could not be found!", vbCritical, "Field error"
                    
                    'Release the objects and exit.
                    Set objJSO = Nothing
                    Set objAcroPDDoc = Nothing
                    Set objAcroAVDoc = Nothing
                    Set objAcroApp = Nothing
                    Exit Sub
                    
                End If
            Next j
            
            'Fill the checkbox field.
            If shWrite.Cells(i, j + 1).Value = "True" Then
                objJSO.GetField(strFieldNames(11)).Value = "Yes"
            End If
            
            On Error GoTo 0
            
            'Create the output path, which will be like C:\Users\Christos\Desktop\Forms\01) First Name Last Name.pdf.
            With shWrite
                If i - 3 < 10 Then
                    strPDFOutPath = ThisWorkbook.Path & "\Forms\0" & i - 3 & ") " & .Cells(i, 2).Value & " " & .Cells(i, 3).Value & ".pdf"
                Else
                    strPDFOutPath = ThisWorkbook.Path & "\Forms\" & i - 3 & ") " & .Cells(i, 2).Value & " " & .Cells(i, 3).Value & ".pdf"
                End If
            End With
            
            'Save the form as new PDF file.
            objAcroPDDoc.Save 1, strPDFOutPath
    
            'Close the form without saving the changes.
            objAcroAVDoc.Close True
            
            'Close the Acrobat application.
            objAcroApp.Exit
               
            'Release the objects.
            Set objJSO = Nothing
            Set objAcroPDDoc = Nothing
            Set objAcroAVDoc = Nothing
            Set objAcroApp = Nothing
            
        Else
        
            MsgBox "Could not open the file!", vbCritical, "File error"
            
            'Close the Acrobat application.
            objAcroApp.Exit
            
            'Release the objects and exit.
            Set objAcroAVDoc = Nothing
            Set objAcroApp = Nothing
            Exit Sub
            
        End If
        
    Next i
    
    'Enable the screen.
    Application.ScreenUpdating = True
    
    'Inform the user that forms were filled.
    MsgBox "All forms were created successfully!", vbInformation, "Finished"
    
End Sub

The code for ReadPDFForms macro follows.

Option Explicit

Sub ReadPDFForms()
   
    '--------------------------------------------------------------------------------------
    'This macro loops through all the files of the specified folder (Forms). If the file
    'is PDF (PDF form here) the macro opens the file, reads specific fields the file
    'and writes the values in the sheet Read.
        
    'The code uses late binding, so no reference to external library is required.
    'However, the code works ONLY with Adobe Professional, so don't try to use it with
    'Adobe Reader because you will get an "ActiveX component can't create object" error.
    
    'Written by:    Christos Samaras
    'Date:          15/10/2013
    'e-mail:        xristos.samaras@gmail.com
    'site:          http://www.myengineeringworld.net
    '--------------------------------------------------------------------------------------

    'Declaring the necessary variables.
    Dim strFormsFolder          As String
    Dim strFieldNames(1 To 7)   As String
    Dim objFSO                  As Object
    Dim objSourceFolder         As Object
    Dim objFileItem             As Object
    Dim j                       As Integer
    Dim LastRow                 As Long
    Dim objAcroApp              As Object
    Dim objAcroAVDoc            As Object
    Dim objAcroPDDoc            As Object
    Dim objJSO                  As Object
    Dim strPDFOutPath           As String
           
    'Disable screen flickering.
    Application.ScreenUpdating = False
        
    'Specify the folder that contains the PDF forms.
    'Full path example (note the \ at the end):
    'strPDFPath = "C:\Users\Christos\Desktop\Forms\"
    'Using workbook path:
    strFormsFolder = ThisWorkbook.Path & "\Forms\"
    
    'Set the required field names in the PDF form.
    strFieldNames(1) = "First Name"
    strFieldNames(2) = "Last Name"
    strFieldNames(3) = "City"
    strFieldNames(4) = "Country"
    strFieldNames(5) = "E-mail"
    strFieldNames(6) = "Type Of Registration"
    strFieldNames(7) = "Previous Attendee"
    
    On Error Resume Next
    
    'Create the File System object.
    Set objFSO = CreateObject("Scripting.FileSystemObject")
    
    'Check if the object was created.
    If Err.Number <> 0 Then
        MsgBox "Could not create the File System object!", vbCritical, "Object error"
        'Release the object and exit.
        Set objFSO = Nothing
        Exit Sub
    End If
    On Error GoTo 0
                    
    'Get information about the Forms folder.
    Set objSourceFolder = objFSO.GetFolder(strFormsFolder)
    
    'Loop through all the files found in the folder Forms.
    For Each objFileItem In objSourceFolder.Files
        
        'Check if the file is pdf.
        If LCase(Right(objFileItem.Path, 3)) = "pdf" Then
            
            'Find the last row of data in sheet Read.
            With shRead
                .Activate
                LastRow = .Cells(.Rows.Count, "B").End(xlUp).Row
            End With
        
            On Error Resume Next
            
            'Initialize Acrobat by creating the App object.
            Set objAcroApp = CreateObject("AcroExch.App")
            
            'Check if the object was created.
            If Err.Number <> 0 Then
                MsgBox "Could not create the App object!", vbCritical, "Object error"
                'Release the objects and exit.
                Set objAcroApp = Nothing
                Set objFileItem = Nothing
                Set objSourceFolder = Nothing
                Set objFSO = Nothing
                Exit Sub
            End If
            
            'Create the AVDoc object.
            Set objAcroAVDoc = CreateObject("AcroExch.AVDoc")
            
            'Check if the object was created.
            If Err.Number <> 0 Then
                MsgBox "Could not create the AVDoc object!", vbCritical, "Object error"
                'Release the objects and exit.
                Set objAcroAVDoc = Nothing
                Set objAcroApp = Nothing
                Set objFileItem = Nothing
                Set objSourceFolder = Nothing
                Set objFSO = Nothing
                Exit Sub
            End If
            
            On Error GoTo 0
        
            'Open the PDF file.
            If objAcroAVDoc.Open(objFileItem.Path, "") = True Then
            
                'Set the PDDoc object.
                Set objAcroPDDoc = objAcroAVDoc.GetPDDoc
               
                'Set the JS Object - Java Script Object.
                Set objJSO = objAcroPDDoc.GetJSObject
                
                'Create a counter in sheet Read (counts each PDF file).
                shRead.Cells(LastRow + 1, 1).Value = LastRow - 2
                
                On Error Resume Next
                
                'Read the form fields.
                For j = 1 To 6
                    
                    shRead.Cells(LastRow + 1, j + 1).Value = objJSO.GetField(strFieldNames(j)).Value
                    
                    If Err.Number <> 0 Then
                        
                        'Close the form without saving the changes.
                        objAcroAVDoc.Close True
                        
                        'Close the Acrobat application.
                        objAcroApp.Exit
                        
                        'Inform the user about the error.
                        MsgBox "The field """ & strFieldNames(j) & """ could not be found!", vbCritical, "Field error"
                        
                        'Release the objects and exit.
                        Set objJSO = Nothing
                        Set objAcroPDDoc = Nothing
                        Set objAcroAVDoc = Nothing
                        Set objAcroApp = Nothing
                        Set objFileItem = Nothing
                        Set objSourceFolder = Nothing
                        Set objFSO = Nothing
                        Exit Sub
                        
                    End If
                Next j
                
                'Read the checkbox field.
                If objJSO.GetField(strFieldNames(7)).Value = "Yes" Then
                    shRead.Cells(LastRow + 1, j + 1).Value = "True"
                Else
                    shRead.Cells(LastRow + 1, j + 1).Value = "False"
                End If
                
                On Error GoTo 0
                
                'Close the PDF file without saving the changes.
                objAcroAVDoc.Close True
                
                'Close the Acrobat application.
                objAcroApp.Exit
               
                'Release the objects.
                Set objJSO = Nothing
                Set objAcroPDDoc = Nothing
                Set objAcroAVDoc = Nothing
                Set objAcroApp = Nothing
    
            Else
            
                MsgBox "Could not open the file!", vbCritical, "File error"
                
                'Close the Acrobat application.
                objAcroApp.Exit
            
                'Release the objects abd exit.
                Set objAcroAVDoc = Nothing
                Set objAcroApp = Nothing
                Set objFileItem = Nothing
                Set objSourceFolder = Nothing
                Set objFSO = Nothing
                
            End If
        
        End If
        
        'Renew the last row value.
        LastRow = LastRow + 1
            
    Next objFileItem
    
    'Release the objects.
    Set objFileItem = Nothing
    Set objSourceFolder = Nothing
    Set objFSO = Nothing
    
    'Adjust the columns' width.
    shRead.Columns("A:H").AutoFit
    
    'Enable the screen.
    Application.ScreenUpdating = True
    
    'Inform the user that the data acquisition finished.
    MsgBox "All data from the forms were retrieved successfully!", vbInformation, "Finished"
    
End Sub



How to find the field names of a PDF form?



Edit PDF Form

Open the PDF form using Adobe Professional and follow the next steps:
  1. Press the Tools button to show the Tools panel.
  2. Select the Forms Tab.
  3. Press the Edit button.
Instead of these three steps, you can just press CTRL + SHIFT + 7.

PDF Field Names

The field names will be shown (highlighted in black color), so you can edit them according to your needs. If you need to retrieve the names of the form fields programmatically, you can use the following VBA code.

Option Explicit

Sub RetrievePDFFormNames()
      
    '--------------------------------------------------------------------------------------
    'This macro opens a PDF form, loops through all of its fields and prints their names
    'in the immediate window of VBA editor (in order to be further processed).
        
    'The code uses late binding, so no reference to external library is required.
    'However, the code works ONLY with Adobe Professional, so don't try to use it with
    'Adobe Reader because you will get an "ActiveX component can't create object" error.
    
    'Written by:    Christos Samaras
    'Date:          15/10/2013
    'e-mail:        xristos.samaras@gmail.com
    'site:          http://www.myengineeringworld.net
    '--------------------------------------------------------------------------------------

    'Declaring the necessary variables.
    Dim strPDFPath              As String
    Dim objAcroApp              As Object
    Dim objAcroAVDoc            As Object
    Dim objAcroForm             As Object
    Dim objobjFields            As Object
    Dim objField                As Object
    
    'Specify the path of the sample PDF form.
    'Full path example:
    'strPDFPath = "C:\Users\Christos\Desktop\Test Form.pdf"
    'Using workbook path:
    strPDFPath = ThisWorkbook.Path & "\" & "Test Form.pdf"
        
    On Error Resume Next
    
    'Initialize Acrobat by creating the App object.
    Set objAcroApp = CreateObject("AcroExch.App")
    
    'Check if the object was created.
    If Err.Number <> 0 Then
        MsgBox "Could not create the App object!", vbCritical, "Object error"
        'Release the object and exit.
        Set objAcroApp = Nothing
        Exit Sub
    End If
    
    'Create the AVDoc object.
    Set objAcroAVDoc = CreateObject("AcroExch.AVDoc")
    
    'Check if the object was created.
    If Err.Number <> 0 Then
        MsgBox "Could not create the AVDoc object!", vbCritical, "Object error"
        'Release the objects and exit.
        Set objAcroAVDoc = Nothing
        Set objAcroApp = Nothing
        Exit Sub
    End If
    
    On Error GoTo 0
        
    'Open the PDF file.
    If objAcroAVDoc.Open(strPDFPath, "") = True Then
         
        'Show the application.
        objAcroApp.Show
        
        On Error Resume Next
        
        'Create the Acrobat Form object.
        Set objAcroForm = CreateObject("AFormAut.App")
        
        'Check if the object was created.
        If Err.Number <> 0 Then
            MsgBox "Could not create the AFormAutobject!", vbCritical, "Object error"
            'Release the objects and exit.
            Set objAcroAVDoc = Nothing
            Set objAcroApp = Nothing
            Exit Sub
        End If
        
        On Error GoTo 0
        
        'Get the fields of the form.
        Set objobjFields = objAcroForm.Fields
        
        'Print the number of form fields.
        Debug.Print "Total form Fields = " & objobjFields.Count
        
        'Print the name of the fields and their values (if there are any).
        For Each objField In objobjFields
            Debug.Print objField.Name & " = " & objField.Value
        Next objField
        
    Else
        
        Debug.Print "Could not open the file!"
        
        'Close the Acrobat application.
        objAcroApp.Exit
        
        'Release the objects and exit.
        Set objAcroAVDoc = Nothing
        Set objAcroApp = Nothing
        Exit Sub
            
    End If
    
    'Close the form without saving the changes.
    objAcroAVDoc.Close True
    
    'Close the Acrobat application.
    objAcroApp.Exit
       
    'Release the objects.
    Set objField = Nothing
    Set objobjFields = Nothing
    Set objAcroForm = Nothing
    Set objAcroAVDoc = Nothing
    Set objAcroApp = Nothing
    
End Sub

PDF Field Names - Immediate Window

The results will appear in the immediate window of the VBA editor. Then you can copy and use them as you need.



Demonstration video



The short video below demonstrates the WritePDFForms and ReadPDFForms macros in action.




Download it from here



Download

The zip file contains an Excel workbook that contains the three macros presented above, the sample PDF form and an empty folder (Forms). The workbook can be opened with Excel 2007 or newer.

Did you like this post? If yes, then share it with your friends. Thank you!


Categories:


Mechanical Engineer (Ph.D. cand.), M.Sc. Cranfield University, Dipl.-Ing. Aristotle University, Thessaloniki - Greece.
Communication: e-mail, Facebook, Twitter, Google+ and Linkedin. More info