10 AnswersNew Answer
Aye, and I'll probably still do it. I recently watched a Python video where knowing how the architecture worked yielded several thousand times better runtime (in cython) by just rearranging a few operations. I've also beaten optimizers by inlining assembly. There are also a number of instructions that you can't get to without using assembly, and aren't selected by optimizers.
Dear John, A knowledgeable and veteran programmer like you is a treasure for this community. Hopefully, learners link me can learn some REAL and hardly-achieved lessons and techniques from you.
I've programmed over 20 different hardware instruction sets starting with CDC Mainframe at college followed by IBM 360 mainframe and DEC System 10. Then various mini computers like DEC PDP. Finally, tons of microprocessors.
@Tor, if one instruction programmed executes as one instruction the hardware processed, yes. I don't know either of those. Instructions look like: LOD A,#1200 LOD B,#1202 ADD A,B STO A,#1204 This loads the data at 1200 and 1202 into registers, adds them updating A, and puts the result into 1204. C++ or Java would code: int a,b,result; ... result=a+b;
Back in the 90s I messed around with it, but it was only useful to me as a learning experience and nothing more. I haven't messed with it since then.
It helps to be able to read assembly even if you'll write in a higher language. There are some real gems there, e.g., this assembly I pulled forward (from AMD's 64-bit processors guide): https://code.sololearn.com/cZX2I8giW2zV/?ref=app "Populaton count" counts bits set in a bit string, in parallel. I also included a "BSR" inline, which returns the most significant bit set in a single instruction. Though trivial, it's something you won't usually see from a compiler's optimizer.
In college, I convinced my professor to spend two classes of the intro to programming class to look at an assembly output of their working basic program (IBM 360 code). After they got graded on the program, I took the best written and generated the code to run it. I taught the class in two parts, architecture & simple examples of usage on day one and going through the code of the program the second. It was a major hit and the professor continued teaching that course in that way.
Does TIS-100 video game count? Or SHENZHEN I/O?
I use 8051/52, avr and arm assembly on embedded systems. I started with 6809e and Z80 assey in the 1980s on home computers and CP/M machines. Then intel 80x86 during the early PC years. M68000 on the Sega Genesis and Palm Pilot PDAs (actualy a Dragonball processor). These days I like embedded C with hand optimized asm.