Channel: Recent Gists from 84adam

↧

Extract text from a PDF given its URL

January 2, 2020, 10:12 am

≫ Next: ssh_vim_bash_SETUP

≪ Previous: Extract text from PDF files

pdf_url_text.py

	importrequests
	frompdfminer.pdfinterpimportPDFResourceManager, PDFPageInterpreter
	frompdfminer.converterimportTextConverter
	frompdfminer.layoutimportLAParams
	frompdfminer.pdfpageimportPDFPage
	fromioimportStringIO, BytesIO

	defconvert_pdf_to_txt(url, pages=None):
	ifnotpages:
	pagenums=set()
	else:
	pagenums=set(pages)
	output=StringIO()
	manager=PDFResourceManager()
	converter=TextConverter(manager, output, laparams=LAParams())
	interpreter=PDFPageInterpreter(manager, converter)

	r=requests.get(url)
	infile=BytesIO(r.content)

	forpageinPDFPage.get_pages(infile, pagenums):
	interpreter.process_page(page)
	infile.close()
	converter.close()
	text=output.getvalue()
	output.close()
	returntext

	if__name__=='__main__':
	url=input("Enter URL of PDF from which to extract text: ")
	# Example URL: https://www.w3.org/WAI/ER/tests/xhtml/testfiles/resources/pdf/dummy.pdf
	# Output:
	# >>> Dummy PDF file

	output=convert_pdf_to_txt(url)
	print(output)

↧

Trending Articles

Practice Sheet of Right form of verbs for HSC Students

September 22, 2019, 11:40 pm

Download: FK ft Shenky – Nakuyewa ”Prod by: Shenky”

February 16, 2017, 4:24 pm

How to win at Markstrat (Markstrat Tips and Tricks) – Vodites

January 5, 2014, 10:34 pm

Ominde Commission Report and Recommendations – Ominde Report of 1964

March 16, 2015, 5:14 am

Bureau of Internal Revenue: Regional Offices (Directory)

January 9, 2014, 11:06 pm

GO 53 on Enhancement of Ex-gratia upto 5 Lakhs Toddy Tappers in Telangana

March 26, 2017, 11:23 pm

Cakewalk CA-2A Leveling Amplifier v2.0.1.97 WiN, v2.0.1.96 OSX Incl Keygen

October 17, 2016, 7:20 am

Mp3 Download: Mdu - Kunjenjenjena

December 7, 2017, 8:16 am

How the kill the job , when DTP request running for long hours.

July 26, 2013, 2:41 am

Microsoft Intune から展開しているアプリのアップデートについて

October 17, 2016, 4:11 am

18-year-old girl was beaten for half an hour by two Northampton men in 'an...

September 1, 2017, 10:00 pm

Car crash in Dunton Bassett leaves driver in critical condition

October 7, 2014, 7:51 am

Macky 2, Two Others In Road Accident

March 29, 2015, 5:34 am

Application log 00000000000000089514: Could not convert queue DLVST90CLNT

May 14, 2015, 11:27 pm

Detroit mafia: D’Anna Brothers agree to plea deal

April 21, 2016, 6:56 am

Delivery block field greyed out using VA02

January 26, 2016, 2:52 pm

Muloraki Au

June 22, 2016, 1:44 am

【個人撮影】スマホのプライベート映像♪「中に出さないで///」カラオケ屋での生ハメ撮りが流出ｗ【リベンジポルノ】＠PornHub

October 12, 2017, 2:23 pm

BREAKING NEWS: Diamond Platnumz Is Reported Dead After Ghastly Car Accident

February 9, 2018, 4:56 am

FIAT 500 B0111 B0112

July 5, 2018, 10:31 am

© 2025 //www.rssing.com