+ 2

Parsing pdf file buffer

Working on an npm package for file ops. Now im trying to add pdf processing support to it but i am having trouble working with the buffer object response i pick after reading the file with fs module and changing response to a string. How can i extract text and other bodies from the document. The response i get is something like this %PDF-1.5 %�� 1 0 obj <</Type/Catalog/Pages 2 0 R/Lang(en-US) /StructTreeRoot 32> endobj 2 0 obj <</Type/Pages/Count 2/Kids[ 3 0 R 26 0 R] >> endobj 3 0 obj <</Type/Page/Parent 2 0 R/Resources<</XObject<</Image5 5 0> endobj 4 0 obj <</Filter/FlateDecode/Length 3185>> stream x��[Ys�8^N~w��^^[[ӊx��R]e�3��Ļ/;��v�^]W�㤝��_�'H�:f�R��> ^ZOA�(H��JNњ)�B��W�I��^^|�^Y_��骕^?^HY-�_ڬ��[��^_*^?NV> �#+��%?Z^Wu��q��3�9�F^Te��^W^U1+��^S��a��/>��Ż|I�ū�^�> <�C^A�]T�(i8�29�^BC�C��Ц�^Q[��1EQs��^N�^D�~x� �*Z�^\�^> j9~��p%U��^jU�x�

javascript python c++file-processing pdf-parser

8th Feb 2020, 4:22 PM

Antony O. Onyango

1 Answer

+ 1

I suspect we can't read using ostream... Would be happy to know if anyone has different view on this. You might would like to try below : https://sourceforge.net/projects/libharu/

12th Feb 2020, 2:59 PM

Ketan Lalcheta