User Id :    Password :      New Member   Forgot Password  
 
Reading PDF file to text in c#
Description This article shows how you can read a pdf file and put their content in a string variable very simpl   No. of Views     9745
  Rating     5
Author Sumit Gupta   Posted On     30 Apr 2011
Tags ASP.NET,C#    

Sample Code   Download Code

I used PDFBox. PDFBox is Java PDF Library but .net version is also there.

So first step is to download PDFBox from the URL http://sourceforge.net/projects/pdfbox/files/

Then add the reference of following two file from the bin directory of downloaded file

PDFBox-0.7.2.dll

IKVM.GNU.Classpath

Then put the following code in a class file to read pdf file:

using System;
using System.Collections.Generic;
using System.Linq;
using System.Web;
using org.pdfbox.pdmodel;
using org.pdfbox.util;

/// <summary>
/// Summary description for ConvertFromPDF
/// </summary>
public class ConvertFromPDF
{
    public static string parseUsingPDFBox(string filename)
    {
        PDDocument doc = PDDocument.load(filename);
        PDFTextStripper stripper = new PDFTextStripper();
        return stripper.getText(doc);
    }
}

 

About Author

About Author I am Sumit Gupta working in 3 Pillar Global Pvt. Ltd as Module Lead. I have 7+ year of experience in .Net technologies. I love to explore new technologies and write technical article. Sumit Gupta
No Photo
 
Country India
Company 3 Pillar Global Pvt. Ltd.
Home Page http://www.facebook.com/sumitgupta1225

Rate this article

Rating options from poor, fair, good, very good to excelent.  
 

Comments

 
 
Posted By Akhil on 17 Aug 2011 at 09:49 PM
 
very helpful topic grate man...
 
 
   
Write your comment here.
Comment
Verification Code