Parsing pdf file buffer | Sololearn: Learn to code for FREE!
New course! Every coder should learn Generative AI!
Try a free lesson
+ 2

Parsing pdf file buffer

Working on an npm package for file ops. Now im trying to add pdf processing support to it but i am having trouble working with the buffer object response i pick after reading the file with fs module and changing response to a string. How can i extract text and other bodies from the document. The response i get is something like this %PDF-1.5 %���� 1 0 obj <</Type/Catalog/Pages 2 0 R/Lang(en-US) /StructTreeRoot 32> endobj 2 0 obj <</Type/Pages/Count 2/Kids[ 3 0 R 26 0 R] >> endobj 3 0 obj <</Type/Page/Parent 2 0 R/Resources<</XObject<</Image5 5 0> endobj 4 0 obj <</Filter/FlateDecode/Length 3185>> stream x��[Ys�8^N~w���^^[[ӊx���R]e�3���Ļ/;��v�^]W�㤝��_�'H�:f�R��> ^ZOA�(H��JNњ)�B��W�I���^^|�^Y_���骕^?^HY-�_ڬ���[���^_*^?NV> �#+��%?Z^Wu��q���3�9�F^Te���^W^U1+��^S���a���/>���Ż|I�ū�^�> <�C^A�]T�(i8�29�^BC�C��Ц�^Q[���1EQs�����^N�^D�~x� �*Z�^\�^> j9~��p%U��^jU�x�

8th Feb 2020, 4:22 PM
Antony O. Onyango
Antony O. Onyango - avatar
1 Answer
+ 1
I suspect we can't read using ostream... Would be happy to know if anyone has different view on this. You might would like to try below : https://sourceforge.net/projects/libharu/
12th Feb 2020, 2:59 PM
Ketan Lalcheta
Ketan Lalcheta - avatar