Reading and processing Excel data | Sololearn: Learn to code for FREE!
New course! Every coder should learn Generative AI!
Try a free lesson
+ 5

Reading and processing Excel data

Hello SoloLearners, I need to process a very large Excel file (multiple data types). Do you have any suggestions on how to efficiently read and process this type of file. I can easily change the file to .csv if it helps. Any language will be fine (C++, Java or Python are preferred). Please share any code you have used and worked well for you. Thanks

20th Jan 2018, 3:06 AM
Red Hawks
Red Hawks - avatar
9 Answers
+ 10
I've played around with it and the best by far is xlwings. www.xlwings.org/ try it first. https://www.xlwings.org/examples
20th Jan 2018, 7:11 AM
Louis
Louis - avatar
+ 9
@Red Hawks, can you please give more details on what is actually performed by the word "process" here, what needs to be processed, and what kind of output were expected from the "process"?. It would also help if you specify the Ms Office version, as Microsoft had changed its file format since Office 2007 IIRC. They adopted OpenOffice's file format since, but I guess you already know that : )
20th Jan 2018, 5:25 AM
Ipang
+ 8
Thanks @Ipang for offering to help. The Excel file contains both string data and numerical data. We want to run some basic statistics such as counting all examples that meet various criteria (e.g. gender). As of now we are planning to convert the Excel file into CSV. Cheers!
21st Jan 2018, 1:34 AM
Red Hawks
Red Hawks - avatar
+ 7
Thank you @Louis, one of my friends is planning to use Python for this program and your link looks interesting.
21st Jan 2018, 1:37 AM
Red Hawks
Red Hawks - avatar
+ 6
Thanks @Kinshuk I will give it a try.
20th Jan 2018, 3:33 AM
Red Hawks
Red Hawks - avatar
+ 5
Thanks for the info @John
20th Jan 2018, 3:40 AM
Red Hawks
Red Hawks - avatar
+ 4
This is the start of Microsoft's Excel binary format. Years ago I did the same thing for Word and ran into bumbs where the documentation was hard to figure out. https://msdn.microsoft.com/en-us/library/office/gg615597(v=office.14).aspx
20th Jan 2018, 3:39 AM
John Wells
John Wells - avatar
+ 4
If you choose to do binary, plan on making tons of files using one feature so you know what is coming. Write a program to dump the file in hexadecimal so you can see the records to compare with the documentation before you code anything. It will make your job much easier.
20th Jan 2018, 3:48 AM
John Wells
John Wells - avatar
+ 2
https://code.sololearn.com/c8mbun90l7HR/?ref=app I made this for someone else, but it didn't help much. This version is currently for reading integers only, but if you want to read strings, you can simply use a class, and a loop is all that you need to read a string, and split it accordingly. I am working on the class version, and so will post it soon.
20th Jan 2018, 3:31 AM
Kinshuk Vasisht
Kinshuk Vasisht - avatar