Last Update: 2009-11-29
Home
At almost regular intervals the debate of RISC vs. CISC sweeps
through the comp.arch newsgroup. John
Mashey of SGI has compiled an extensive article in multiple parts on
that subject that he reposts on these occasions. I have compiled one
of his reposts into this HTML document for easier reading, especially
of the tables. I've tried to preserve the original text and
formatting if possible, only a few obvious typos have been fixed.
The orginal usenet posting from which I derived this document should be
available on Google from these
search results.
This article is made available on the WWW by permission of the
author, John Mashey.
Achim Gratz
RISC vs. CISC
Path: irz401!fu-berlin.de!news.apfel.de!cpk-news-hub1.bbnplanet.com!su-news-feed4.bbnplanet.com!news.bbnplanet.com!enews.sgi.com!news.corp.sgi.com!mash.engr.sgi.com!mash
From: mash@mash.engr.sgi.com (John R. Mashey)
Newsgroups: comp.arch
Subject: Re: RISC vs CISC ... [the big posting: for the N+1st time]
Date: 3 Jun 1997 20:18:01 GMT
Organization: Silicon Graphics, Inc.
Lines: 823
Message-ID: <5n1u5p$5hn$2@murrow.corp.sgi.com>
References: <5qsxov$gkm@skyway.bridge.net> <5mnh29$72p$1@murrow.corp.sgi.com> <5mnksb$btk@tooting.netapp.com> <338F9902.5016@OntheNet.com.au> <5mtanv$en2$1@sue.cc.uregina.ca>
NNTP-Posting-Host: mash.engr.sgi.com
Xref: irz401 comp.arch:39141
Long-time readers of this newsgroup can stop here, you've seen it
before, but once again, confusion is breaking out.
Note: for a shorter and more coherent form of this, read
Microprocessor Report, March 26, 1992, "CISCs are Not RISCs, and Not
Converging Either."
Article: 46782 of comp.arch
Newsgroups: comp.arch
From: mash@mash.engr.sgi.com (John R. Mashey)
Subject: Re: Exception handling in PowerPC [no: RISC-vs-CISC, one more time]
Organization: Silicon Graphics, Inc.
Date: Tue, 28 Feb 95 13:11:47 1995
In article <3ibmb6$so1@newsbf02.news.aol.com>, danhicks@aol.com (DanHicks) writes:
Oh? Consider the problem of designing exception handling for the
P6. It's pipelining, out-of-order and speculative execution that make
exception handling tough, and both the CISC and RISC people are doing
as much of this as they can. It has little to do with the instruction
set.
True, but, as you point out, the distinction between RISC and CISC
is disappearing, to the point where the main distinction is
psychological. And that's the biggest problem: The RISC designers
can't be convinced of the importance of designing an architecture
suited for something other than benchmarks, whereas the CISC designers
realize that accommodating operating system requirements is a critical
part of their job.
- Attached is the Nth repost of the discussion of RISC-vs-CISC
non-convergence, i.e., architecture != implementation, one more
time. If you've followed this newsgroup for a while, you've
seen this before. I have included some followon discussions,
and done minuscule editing.
- The comments about RISC designers above are simply wrong; "All
generalizations are false", but especially this one.
- re: Exception-handling: as I've noted for years in public talks
(the "Car" talk especially), tricky exception-handling is the
price for machines with increased parallelism and overlap.
Errata sheets are customarily filled with bugs around exception
cases. Oddly enough, speculative-execution, o-o-o machines may
actually fare better than you'd think, given that they already
need mechanisms for undoing (more-or-less) completed
instructions. From history, recall that the 360/91, circa 1967,
had some imprecise exceptions, as did the 360/67 (early 360 with
virtual memory), so exciting exception handling have been with
us for a while.
====REPOST====
Article 22850 of comp.arch:
Path: mips!mash
Subject: Nth re-posting of CISC vs RISC (or what is RISC, really)
Message-ID: <2419@spim.mips.COM>
Most of you have seen most of this several times before; there is a
little editing, nothing substantial. Some followup comments have been
added.
- PART I - ARCHITECTURE, IMPLEMENTATION, DIFFERENCES
- PART II - ADDRESSING MODES
- PART III - MORE ON TERMINOLOGY; WOULD YOU CALL THE CDC 6600 A RISC?
- PART IV - RISC, VLIW, STACKS
PART I - ARCHITECTURE, IMPLEMENTATION, DIFFERENCES
WARNING: you may want to print this one to read it...
(from preceding discussion):
Anyway, it is not a fair comparison. Not by a long stretch.
Let's see how the Nth generation SPARC, MIPS, and 88K's do (assuming
they last) compared to some new design from scratch.
Well, there is baggage and there is BAGGAGE. One must be careful
to distinguish between ARCHITECTURE and IMPLEMENTATION:
- Architectures persist longer than implementations, especially
user-level Instruction-Set Architecture.
- The first member of an architecture family is usually designed
with the current implementation constraints in mind, and if
you're lucky, software people had some input.
- If you're really lucky, you anticipate 5-10 years of technology
trends, and that modifies your idea of the ISA you commit to.
- It's pretty hard to delete anything from an ISA, except where:
- You can find that NO ONE uses a feature (the 68020->68030
deletions mentioned by someon else).
- You believe that you can trap and emulate the feature
"fast enough". i.e., microVAX support for decimal ops,
68040 support for transcendentals.
Now, one might claim that the i486 and 68040 are RISC
implementations of CISC architectures.... and I think there is some
truth to this, but I also think that it can confuse things badly:
Anyone who has studied the history of computer design knows that
high-performance designs have used many of the same techniques for years,
for all of the natural reasons, that is:
- They use as much pipelining as they can, in some cases, if
this means a high gate-count, then so be it.
- They use caches (separate I & D if convenient).
- They use hardware, not micro-code for the simpler operations.
(For instance, look at the evolution of the S/360 products.
Recall that the 360/85 used caches, back around 1969, and within a few
years, so did any mainframe or supermini.)
Question: So, what difference is there among machines if similar
implementation ideas are used?
Answer: There is a very specific set of characteristics shared by most
machines labeled RISCs, most of which are not shared by most CISCs.
The RISC characteristics:
- Are aimed at more performance from current compiler technology
(i.e., enough registers). OR
- Are aimed at fast pipelining in a virtual-memory environment
with the ability to still survive exceptions without
inextricably increasing the number of gate delays (notice that I
say gate delays, NOT just how many gates).
Even though various RISCs have made various decisions, most of
them have been very careful to omit those things that CPU designers
have found difficult and/or expensive to implement, and especially,
things that are painful, for relatively little gain.
I would claim, that even as RISCs evolve, they may have certain
baggage that they'd wish weren't there.... but not very much. In
particular, there are a bunch of objective characteristics shared by
RISC ARCHITECTURES that clearly distinguish them from CISC
architectures.
I'll give a few examples, followed by the detailed analysis:
MOST RISCs:
- Have 1 size of instruction in an instruction stream
- And that size is 4 bytes
- Have a handful (1-4) addressing modes) (* it is VERY hard to
count these things; will discuss later).
- Have NO indirect addressing in any form (i.e., where you need
one memory access to get the address of another operand in memory)
- Have NO operations that combine load/store with arithmetic,
i.e., like add from memory, or add to memory. (note: this
means especially avoiding operations that use the value of a
load as input to an ALU operation, especially when that
operation can cause an exception. Loads/stores with address
modification can often be OK as they don't have some of the
bad effects)
- Have no more than 1 memory-addressed operand per instruction
- Do NOT support arbitrary alignment of data for loads/stores
- Use an MMU for a data address no more than once per instruction
- Have >=5 bits per integer register specifier
- Have >= 4 bits per FP register specifier
These rules provide a rather distinct dividing line among
architectures, and I think there are rather strong technical reasons
for this, such that there is one more interesting attribute: almost
every architecture whose first instance appeared on the market from
1986 onward obeys the rules above...
Note that I didn't say anything about counting the number of
instructions...
So, here's a table:
- C: number of years since first implementation sold in this
family (or first thing which with this is binary compatible).
Note: this table was first done in 1991, so year = 1991-(age in
table).
- 3a: # instruction sizes
- 3b: maximum instruction size in bytes
- 3c: number of distinct addressing modes for accessing data (not
jumps) I didn't count register orliteral, but only ones that
referenced memory, and I counted different formats with
different offset sizes separately. This was hard work...
Also, even when a machine had different modes for
register-relative and PC_relative addressing, I counted them
only once.
- 3d: indirect addressing (0 - no, 1 - yes)
- 4a: load/store combined with arithmetic (0 - no, 1 - yes)
- 4b: maximum number of memory operands
- 5a: unaligned addressing of memory references allowed in
load/store, without specific instructions ( 0 - no never [MIPS,
SPARC, etc], 1 - sometimes [as in RS/6000], 2 - just about any
time)
- 5b: maximum number of MMU uses for data operands in an
instruction
- 6a: number of bits for integer register specifier
- 6b:number of bits for 64-bit or more FP register specifier,
distinct from integer registers
Note that all of these are ARCHITECTURE issues, and it is usually
quite difficult to either delete a feature (3a-5b) or increase the
number of real registers (6a-6b) given an initial isntruction set
design. (yes, register renaming can help, but...)
Now:
- items 3a, 3b, and 3c are an indication of the decode complexity,
- 3d-5b hint at the ease or difficulty of pipelining, especially
in the presence of virtual-memory requirements, and need to go
fast while still taking exceptions sanely
- items 6a and 6b are more related to ability to take good
advantage of current compilers.
There are some other attributes that can be useful, but I couldn't
imagine how to create metrics for them without being very subjective;
for example "degree of sequential decode", "number of writebacks that
you might want to do in the middle of an instruction, but can't,
because you have to wait to make sure you see all of the instruction
before committing any state, because the last part might cause a page
fault," or "irregularity/assymetricness of register use", or
"irregularity/complexity of instruction formats". I'd love to use
those, but just don't know how to measure them. Also, I'd be happy to
hear corrections for some of these.
So, here's a table of 12 implementations of various architectures,
one per architecture, with the attributes above. Just for fun, I'm
going to leave the architectures coded at first, although I'll identify
them later. I'm going to draw a line between H1 and L4 (obviously,
the RISC-CISC Line), and also, at the head of each column, I'm going
to put a rule, which, in that column, most of the RISCs obey.
Any RISC that does not obey it is marked with a +; any CISC that DOES
obey it is marked with a *. So...
CPU | Age (1991) | 3a | 3b | 3c | 3d | 4a | 4b | 5a | 5b | 6a | 6b | # ODD |
|
RULE | <6 | =1 | =4 | <5 | =0 | =0 | =1 | <2 | =1 | >4 | >3 | |
|
A1 | 4 | 1 | 4 | 1 | 0 | 0 | 1 | 0 | 1 | 8 | 3+ | 1 | RISC
|
B1 | 5 | 1 | 4 | 1 | 0 | 0 | 1 | 0 | 1 | 5 | 4 | -
|
C1 | 2 | 1 | 4 | 2 | 0 | 0 | 1 | 0 | 1 | 5 | 4 | -
|
D1 | 2 | 1 | 4 | 3 | 0 | 0 | 1 | 0 | 1 | 5 | 0+ | 1
|
E1 | 5 | 1 | 4 | 10+ | 0 | 0 | 1 | 0 | 1 | 5 | 4 | 1
|
F1 | 5 | 2+ | 4 | 1 | 0 | 0 | 1 | 0 | 1 | 4+ | 3+ | 3
|
G1 | 1 | 1 | 4 | 4 | 0 | 0 | 1 | 1 | 1 | 5 | 5 | -
|
H1 | 2 | 1 | 4 | 4 | 0 | 0 | 1 | 0 | 1 | 5 | 4 | -
|
| | | | | | | | | | | |
|
L4 | 26 | 4 | 8 | 2* | 0* | 1 | 2 | 2 | 4 | 4 | 2 | 2 | CISC
|
M2 | 12 | 12 | 12 | 15 | 0* | 1 | 2 | 2 | 4 | 3 | 3 | 1
|
N1 | 10 | 21 | 21 | 23 | 1 | 1 | 2 | 2 | 4 | 3 | 3 | -
|
O3 | 11 | 11 | 22 | 44 | 1 | 1 | 2 | 2 | 8 | 4 | 3 | -
|
P3 | 13 | 56 | 56 | 22 | 1 | 1 | 6 | 2 | 24 | 4 | 0 | -
|
An interesting exercise is to analyze the ODD cases.
First, observe that of 12 architectures, in only 2 cases does an
architecture have an attribute that puts it on the wrong side of the line.
Of the RISCs:
- A1 is slightly unusual in having more integer registers, and
less FP than usual. [Actually, slightly out of date, 29050 is
different, using integer register bank instead, I hear.]
- D1 is unusual in sharing integer and FP registers (that's what
the D1:6b == 0).
- E1 seems odd in having a large number of address modes. I think
most of this is an artifact of the way that I counted, as this
architecture really only has a fundamentally small number of
ways to create addresses, but has several different-sized
offsets and combinations, but all within 1 4-byte instruction; I
believe that it's addressing mechanisms are fundamentally MUCH
simpler than, for example, M2, or especially N1, O3, or P3, but
the specific number doesn't capture it very well.
- F1... is not sold any more.
- H1 one might argue that this process has 2 sizes of
instructions, but I'd observe that at any point in the
instruction stream, the instructions are either 4-bytes long, or
8-bytes long, with the setting done by a mode bit, i.e., not
dynamically encoded in every instruction.
Of the processors called CISCs:
- L4 happens to be one in which you can tell the length of the
instruction from the first few bits, has a fairly regular
instruction decode, has relatively few addressing modes, no
indirect addressing. In fact, a big subset of its instructions
are actually fairly RISC-like, although another subset is very
CISCy.
- M2 has a myriad of instruction formats, but fortunately avoided
indirect addressing, and actually, MOST of instructions only
have 1 address, except for a small set of string operations with
2. I.e., in this case, the decode complexity may be high, but
most instructions cannot turn into
multiple-memory-address-with-side-effects things.
- N1,O3, and P3 are actually fairly clean, orthogonal
architectures, in which most operations can consistently have
operands in either memory or registers, and there are relatively
few weirdnesses of special-cased uses of registers.
Unfortunately, they also have indirect addressing, instruction
formats whose very orthogonality almost guarantees sequential
decoding, where it's hard to even know how long an instruction
is until you parse each piece, and that may have side-effects
where you'd like to do a register write-back early, but either:
- must wait until you see all of the instruction until you
commit state or
- must have "undo" shadow-registers or
- must use instruction-continuation with fairly tricky
exception handling to restore the state of the machine.
It is also interesting to note that the original member of the
family to which O3 belongs was rather simpler in some of the critical
areas, with only 5 instruction sizes, of maximum size 10 bytes, and no
indirect addressing, and requiring alignment (i.e., it was a much more
RISC-like design, and it would be a fascinating speculation to know if
that extra complexity was useful in practice). Now, here's the table
again, with the labels:
CPU | Age (1991) | 3a | 3b | 3c | 3d | 4a | 4b | 5a | 5b | 6a | 6b | # ODD | |
|
RULE | <6 | =1 | =4 | <5 | =0 | =0 | =1 | <2 | =1 | >4 | >3 | | |
|
A1 | 4 | 1 | 4 | 1 | 0 | 0 | 1 | 0 | 1 | 8 | 3+ | 1 | AMD 29K | RISC
|
B1 | 5 | 1 | 4 | 1 | 0 | 0 | 1 | 0 | 1 | 5 | 4 | - | MIPS R2000
|
C1 | 2 | 1 | 4 | 2 | 0 | 0 | 1 | 0 | 1 | 5 | 4 | - | SPARC V7
|
D1 | 2 | 1 | 4 | 3 | 0 | 0 | 1 | 0 | 1 | 5 | 0+ | 1 | MC88000
|
E1 | 5 | 1 | 4 | 10+ | 0 | 0 | 1 | 0 | 1 | 5 | 4 | 1 | HP PA
|
F1 | 5 | 2+ | 4 | 1 | 0 | 0 | 1 | 0 | 1 | 4+ | 3+ | 3 | IBM RT/PC
|
G1 | 1 | 1 | 4 | 4 | 0 | 0 | 1 | 1 | 1 | 5 | 5 | - | IBM RS/6000
|
H1 | 2 | 1 | 4 | 4 | 0 | 0 | 1 | 0 | 1 | 5 | 4 | - | Intel i860
|
| | | | | | | | | | | | |
|
L4 | 26 | 4 | 8 | 2* | 0* | 1 | 2 | 2 | 4 | 4 | 2 | 2 | IBM3090 | CISC
|
M2 | 12 | 12 | 12 | 15 | 0* | 1 | 2 | 2 | 4 | 3 | 3 | 1 | Intel i486
|
N1 | 10 | 21 | 21 | 23 | 1 | 1 | 2 | 2 | 4 | 3 | 3 | - | NSC 32016
|
O3 | 11 | 11 | 22 | 44 | 1 | 1 | 2 | 2 | 8 | 4 | 3 | - | MC 68040
|
P3 | 13 | 56 | 56 | 22 | 1 | 1 | 6 | 2 | 24 | 4 | 0 | - | VAX
|
General comment: this may sound weird, but in the long term, it
might be easier to deal with a really complicated bunch of instruction
formats, than with a complex set of addressing modes, because at least
the former is more amenable to pre-decoding into a cache of decoded
instructions that can be pipelined reasonably, whereas the pipeline on
the latter can get very tricky (examples to follow). This can lead to
the funny effect that a relatively "clean", orthogonal architecture may
actually be harder to make run fast than one that is less clean.
Obviously, every weirdness has it's penalties...
But consider the fundamental difficulty
of pipelining something like (on a VAX):
ADDL @(R1)+,@(R1)+,@(R2)+
I.e., something that, might theoretically arise from:
register **r1, **r2;
**r2++ = **r1++ + **r1++;
Now, consider what the VAX has to do:
- Decode the opcode (ADD)
- Fetch first operand specifier from I-stream and work on it.
- Compute the memory address from (r1)
If aligned
run through MMU
if MMU miss, fixup
access cache
if cache miss, do write-back/refill
Elseif unaligned
run through MMU for first part of data
if MMU miss, fixup
access cache for that part of data
if cache miss, do write-back/refill
run through MMU for second part of data
if MMU miss, fixup
access cache for second part of data
if cache miss, do write-back/refill
Now, in either case, we now have a longword that has the
address of the actual data.
- Increment r1 [well, this is where you'd LIKE to do it, or
in parallel with step 2a).] However, see later why not...
- Now, fetch the actual data from memory, using the address just
obtained, doing everything in step 2a) again, yielding the
actual data, which we need to stick in a temporary buffer, since it
doesn't actually go in a register.
- Now, decode the second operand specifier, which goes thru
everything that we did in step 2, only again, and leaves
the results in a second temporary buffer. Note that we'd
like to be starting this before we get done with all of 2
(and I THINK the VAX9000 probably does that??) but you
have to be careful to bypass/interlock on potential
side-effects to registers... actually, you may well have
to keep shadow copies of every register that might get
written in the instruction, since every operand can use
auto-increment/decrement. You'd probably want badly to try
to compute the address of the second argument and do the
MMU access interleaved with the memory access of the
first, although the ability of any operand to need 2-4 MMU
accesses probably makes this tricky. [Recall that any MMU
access may well cause a page fault...]
- Now, do the add. [could cause exception]
- Now, do the third specifier... only, it might be a little
different, depending on the nature of the cache, that is,
you cannot modify cache or memory, unless you know it will
complete. (Why? well, suppose that the location you are
storing into overlaps with one of the indirect-addressing
words pointed to by r1 or 4(r1), and suppose that the
store was unaligned, and suppose that the last byte of the
store crossed a page boundary and caused a page fault, and
that you'd already written the first 3 bytes. If you did
this straightforwardly, and then tried to restart the
instruction, it wouldn't do the same thing the second
time.
- When you're sure all is well, and the store is on its way,
then you can safely update the two registers, but you'd
better wait until the end, or else, keep copies of any
modified registers until you're sure it's safe. (I think
both have been done?)
- You may say that this code is unlikely, but it is legal,
so the CPU must do it. This style has the following
effects:
- You have to worry about unlikely cases.
- You'd like to do the work, with predictable uses of
functional units, but instead, they can make
unpredictable demands.
- You'd like to minimize the amount of buffering and
state, but it costs you in both to go fast.
- Simple pipelining is very, very tough: for example,
it is pretty hard to do much about the next
instruction following the ADDL, (except some early
decode, perhaps), without a lot of gates for
special-casing. (I've always been amazed that CVAX
chips are fast as they are, and VAX 9000s are REALLY
impressive...)
- EVERY memory operand can potentially cause 4 MMU uses,
and hence 4 MMU faults that might actually be page
faults...
- AND there are even worse cases, like the addp6
instruction, that can require *40* pages to be
resident to complete...
- Consider how "lazy" RISC designers can be:
- Every load/store uses exactly 1 MMU access.
- The compilers are often free to re-arrange the
order, even across what would have been the next
instruction on a CISC. This gets rid of some stalls
that the CISC may be stuck with (especially memory
accesses).
- The alignment requirement avoids especially the
problem with sending the first part of a store on
the way before you're SURE that the second part of
it is safe to do.
Finally, to be fair, let me add the two cases that I knew of that were more
on the borderline: i960 and Clipper:
CPU | Age (1991) | 3a | 3b | 3c | 3d | 4a | 4b | 5a | 5b | 6a | 6b | # ODD | |
|
RULE | <6 | =1 | =4 | <5 | =0 | =0 | =1 | <2 | =1 | >4 | >3 | | |
|
J1 | 5 | 4+ | 8+ | 9+ | 0 | 0 | 1 | 0 | 2 | 4+ | 3+ | 5 | Clipper |
|
K1 | 3 | 2+ | 8+ | 9+ | 0 | 0 | 1 | 2+ | - | 5 | 3+ | 5 | Intel 960KB |
|
(I think an ARM would be in this area as well; I think somebody
once sent me an ARM-entry, but I can't find it again; sorry.)
Note: slight modification (I'll integrate this sometime):
From jfc@MIT.EDU Mon Nov 29 12:59:55 1993
Subject: Re: Why are Motorola's slower than Intel's ? [really what's a RISC]
Newsgroups: comp.arch
Organization: Massachusetts Institute of Technology
Since you made your table IBM has released a couple chips that
support unaligned accesses in hardware even across cache line
boundaries and may store part of an unaligned object before taking a
page fault on the second half, if the object crosses a page boundary.
These are the RSC (single chip POWER) and PPC 601 (based on RSC
core).
John Carr (jfc@mit.edu)
(Back to me; jfc's comments are right; if I had time, I'd add another
line to do PPC... which, in some sense replays the S/360 -> S/370
history of relaxing alignment restrictions somewhat. I conjecture
that at least some of this was done to help Apple s/w migration.)
SUMMARY
- RISCs share certain architectural characteristics, although
there are differences, and some of those differences matter a
lot.
- However, the RISCs, as a group, are much more alike than the
CISCs as a group.
- At least some of these architectural characteristics have fairly
serious consequences on the pipelinability of the ISA, especially
in a virtual-memory, cached environment.
- Counting instructions turns out to be fairly irrelevant:
- It's HARD to actually count instructions in a meaningful
way... (if you disagree, I'll claim that the VAX is RISCier
than any RISC, at least for part of its instruction set :-)
Why: VAX has a MOV opcode, whereas RISCs usually have
a whole set of opcodes for {LOAD/STORE} {BYTE, HALF, WORD}
- More instructions aren't what REALLY hurts you, anywhere
near as much features that are hard to pipeline
- RISCs can perfectly well have string-support, or decimal
arithmetic support, or graphics transforms... or lots of
strange register-register transforms, and it won't cause
problems..... but compare that with the consequence of
adding a single instruction that has 2-3 memory operands,
each of which can go indirect, with auto-increments,
and unaligned data...
PART II - ADDRESSING MODES
Article: 30346 of comp.arch
Path: odin!mash.wpd.sgi.com!mash
Subject: Updated addressing mode table
Message-ID: < C52tAM.K4B@odin.corp.sgi.com>
Nntp-Posting-Host: mash.wpd.sgi.com
I promised to repost this with fixes, and people have been asking
for it, so here it is again: if you saw it before, all that's really
different is some fixes in the table, and a few clarified
explanations:
THE GIANT ADDDRESSING MODE TABLE (Corrections happily accepted)
This table goes with the higher-level table of general architecture
characteristics.
Address mode summary
r | register |
|
r+ | autoincrement (post) | [by size of data object]
|
-r | autodecrement (pre) | [by size,...and this was the one I meant]
|
>r | modify base register | [generally, effective address -> base]
NOTE: sometimes this subsumes r+, -r, etc,
and is more general, so I categorize it
as a separate case.
|
d | displacement | d1 & d2 if 2 different displacements
|
x | index register |
|
s | scaled index |
|
a | absolute | [as a separate mode, as opposed to displacement+(0)
|
I | Indirect |
|
Shown below are 22 distinct addressing modes [you can argue
whether these are right categories]. In the table are the *number* of
different encodings/variations [and this is a little fuzzy; you can
especially argue about the 4 in the HP PA column, I'm not even sure
that's right]. For example, I counted as different variants on a mode
the case where the structure was the same, but there were
different-sized displacements that had to be decoded. Note that
meaningfully counting addressing modes is *at least as bad* as
meaningfully counting opcodes; I did the best I could, and I spent a
lot of hours looking at manuals for the chips I hadn't programmed
much, and in some cases, even after hours, it was hard for me to
figure out meaningful numbers... *Most* of these architectures are used
in general-purpose systems and *most* have at least one version that
uses caches: those are important because many of the issues in
thinking about addressing modes come from their interactions with MMUs
and caches...
| 1 | 2 | 3 | 4 | 5 | 6 | 7 | | 8 | 9 | 10 | 11 | 12 | | 13 | 14 | 15 | | 16 | 17 | 18 | 19 | 20 | 21 | 22
|
| | | | | | | | | | | | | | | | | | | | | | | | r | r
|
| | | | | | | | | | | | | | | | | | | | | r | r | r | +d1 | +d1
|
| | | | | r | r | r | | | | | | | | | r | r | | | r | r+ | +d | +d1 | I | +s
|
| | r | r | r | +d | +x | +s | | | | | s+ | s+ | | s+ | +d | +d | | r+ | +d | I | I | I | +s | I
|
| r | +d | +x | +s | >r | >r | >r | | r+ | -r | a | a | r+ | | -r | +x | +s | | I | I | +s | +s | +d2 | +d2 | +d2
|
|
AMD 29K | 1 | . | . | . | . | . | . | | . | . | . | . | . | | . | . | . | | . | . | . | . | . | . | .
|
Rxxx | . | 1 | . | . | . | . | . | | . | . | . | . | . | | . | . | . | | . | . | . | . | . | . | .
|
SPARC | . | 1 | 1 | . | . | . | . | | . | . | . | . | . | | . | . | . | | . | . | . | . | . | . | .
|
88K | . | 1 | 1 | 1 | . | . | . | | . | . | . | . | . | | . | . | . | | . | . | . | . | . | . | .
|
HP PA | . | 2 | 1 | 1 | 4 | 1 | 1 | | . | . | . | . | . | | . | . | . | | . | . | . | . | . | . | .
|
ROMP | 1 | 2 | . | . | . | . | . | | . | . | . | . | . | | . | . | . | | . | . | . | . | . | . | .
|
POWER | . | 1 | 1 | . | 1 | 1 | . | | . | . | . | . | . | | . | . | . | | . | . | . | . | . | . | .
|
i860 | . | 1 | 1 | . | 1 | 1 | . | | . | . | . | . | . | | . | . | . | | . | . | . | . | . | . | .
|
Swrdfish | 1 | 1 | 1 | . | . | . | . | | . | . | 1 | . | . | | . | . | . | | . | . | . | . | . | . | .
|
ARM | 2 | 2 | . | 2 | 1 | . | 1 | | 1 | 1 | . | . | . | | . | . | . | | . | . | . | . | . | . | .
|
Clipper | 1 | 3 | 1 | . | . | . | . | | 1 | 1 | 2 | . | . | | . | . | . | | . | . | . | . | . | . | .
|
i960KB | 1 | 1 | 1 | 1 | . | . | . | | . | . | 2 | 2 | . | | . | 1 | . | | . | . | . | . | . | . | .
|
| . | . | . | . | . | . | . | | . | . | . | . | . | | . | . | . | | . | . | . | . | . | . | .
|
S/360 | . | 1 | . | . | . | . | . | | . | . | . | . | . | | . | 1 | . | | . | . | . | . | . | . | .
|
i486 | 1 | 3 | 1 | 1 | . | . | . | | 1 | 1 | 2 | . | . | | . | 2 | 3 | | . | . | . | . | . | . | .
|
NSC32K | . | 3 | . | . | . | . | . | | 1 | 1 | 3 | 3 | . | | . | . | 3 | | . | . | . | . | 9 | . | .
|
MC68000 | 1 | 1 | . | . | . | . | . | | 1 | 1 | 2 | . | . | | . | 2 | . | | . | . | . | . | . | . | .
|
MC68020 | 1 | 1 | . | . | . | . | . | | 1 | 1 | 2 | . | . | | . | 2 | 4 | | . | . | . | . | . | 16 | 16
|
VAX | 1 | 3 | . | 1 | . | . | . | | 1 | 1 | 1 | 1 | 1 | | 1 | . | 3 | | 1 | 3 | 1 | 3 | . | . | .
|
COLUMN NOTES:
- Columns 1-7 are addressing modes used by many machines, but very few,
if any clearly-RISC architectures use anything else. They are all
characterized by what they don't have:
- 2 adds needed before generating the address
- indirect addressing
- variable-sized decoding
- Columns 13-15 include fairly simple-looking addressing modes,
which however, *may* require 2 back-to-back adds before the address is
available. [*may* because some of them use index-register=0 or
something to avoid indexing, and usually in such machines, you'll see
variable timing figures, depending on use of indexing.]
- Columns 16-22 use indirect addressing.
ROW NOTES
- Clipper & i960, of current chips, are more on the RISC-CISC border,
or are sort of "modern CISCs". ARM is also characterized (by ARM people,
Hot Chips IV: "ARM is not a pure RISC").
- ROMP has a number of characteristics different from the rest of the RISCs,
you might call it "early RISC", and it is of course no longer made.
- You might consider HP PA a little odd, as it appears to have
more addressing modes, in the same way that CISCs do, but I don't
think this is the case: it's an issue of whether you call something
several modes or one mode with a modifier, just as there is trouble
counting opcodes (with & without modifiers). From my view, neither PA
nor POWER have truly "CISCy" addressing modes.
- Notice difference between 68000 and 68020 (and later 68Ks): a
bunch of incredibly-general & complex modes got added...
- Note that the addressing on the S/360 is actually pretty simple,
mostly base+displacement, although RX-addressing does take 2
regs+offset.
- A dimension *not* shown on this particular chart, but also
highly relevant, is that this chart shows the different *types* of
modes, *not* how many addresses can be found in each instruction.
That may be worth noting also:
AMD : i960 | 1 | one address per instruction
|
S/360 - MC68020 | 2 | up to 2 addresses
|
VAX | 6 | up to 6
|
By looking at alignment, indirect addressing, and looking only at
those chips that have MMUs, consider the number of times an MMU
*might* be used per instruction for data address translations:
AMD - Clipper | 2 | [Swordfish & i960KB: no TLB]
|
S/360 - NSC32K | 4 |
|
MC68Ks (all) | 8 |
|
VAX | 24 |
|
When RS/6000 does unaligned, it must be in the same cache line
(and thus also in same MMU page), and traps to software otherwise,
thus avoiding numerous ugly cases.
Note: in some sense, S/360s & VAXen can use an arbitrary number of
translations per instruction, with MOVE CHARACTER LONG, or similar
operations & I don't count them as more, because they're defined to be
interruptable/restartable, saving state in general-purpose registers,
rather than hidden internal state.
SUMMARY
- Computer design styles mostly changed from machines with:
2-6 addresses per instruction, with variable sized encoding
address specifiers were usually "orthogonal", so that any could
go anywhere in an instruction
- sometimes indirect addressing
- sometimes need 2 adds *before* effective address is available
- sometimes with many potential MMU accesses (and possible exceptions)
- per instruciton, often buried in the middle of the instruction,
- and often *after* you'd normally want to commit state because
of auto-increment or other side effects.
to machines with:
- 1 address per instruction
- address specifiers encoded in small # of bits in 32-bit instruction
- no indirect addressing
- never need 2 adds before address available
- use MMU once per data access
and we usually call the latter group RISCs. I say "changed"
because if you put this table together with the earlier one,
which has the age in years, the older ones were one way, and the
newer ones are different.
- Now, ignoring any other features, but looking at this single
attribute (architectural addressing features and implementation
effects therof), it ought to be clear that the machines in the
first part of the table are doing something *technically*
different from those in the second part of the table. Thus,
people may sometimes call something RISC that isn't, for
marketing reasons, but the people calling the first batch RISC
really did have some serious technical issues at heart.
- One more time: this is *not* to say that RISC is better than
CISC, or that the few in the middle are bad, or anything
like that... but that there are clear technical
characteristics...
PART III - MORE ON TERMINOLOGY; WOULD YOU CALL THE CDC 6600 A RISC?
Article: 39495 of comp.arch
Newsgroups: comp.arch
From: mash@mash.engr.sgi.com (John R. Mashey)
Subject: Re: Why CISC is bad (was P6 and Beyond)
Organization: Silicon Graphics, Inc.
Date: Wed, 6 Apr 94 18:35:01 PDT
In article <2nii0d$kkn@crl2.crl.com>, dbennett@crl.com (Andrea Chen) writes:
You may be correct on the creation of the term, but RISC does
refer to a school of computer design that dates back to the early
seventies.
This is all getting fairly fuzzy and subjective, but it seems very
confusing to label RISC as a school of thought that dates back to the
early 1970s.
- One can say that RISC is a school of thought that got popular in
the early-to-mid 80's, and got widespread commercial use then.
- One can say that there were a few people (like John Cocke & co
at IBM) who were doing RISC-style research projects in the mid-70s.
- But if you want to go back, as has been discussed in this
newsgroup often, a lot of people go back to the CDC 6600, whose
design started in 1960, and was delivered in 4Q 1964. Now,
while this wouldn't exactly fit the exact parameters of current
RISCs, a great deal of the RISC-style approach was there in the
central processor ISA:
- Load/store architecture.
- 3-address register-register instructions
- Simply-decoded instruction set
- Early use of instructions schedule by compiler,
expectation that you'd usually program in high-level
language and not often to assembler, as you'd expect
compiler to do well.
- More registers than common at the time
- ISA designed to make decode/issue easy
Note that the 360/91 (1967) offered a good example of
building a CISC-architecture into a high-performance machine,
and was an interesting comparison to the 6600.
- Maybe there is some way to claim that RISC goes back to the
1950s, but in general, most machines of the 1950s and 1960s
don't feel very RISCy (to me). Consider Burroughs B5000s; IBM
709x, 707x, 1401s; Univac 110x; GE 6xx, etc, and of course,
S/360s. Simple load/store architectures were hard to find;
there were often exciting instruction decodings required;
indirect addressing was popular; machines often had very few
accumulators.
- If you want to try sticking this in the matrix I've published
before, as best as I recall, the 6600 ISA generally looked like:
CPU | Age (1991) | 3a | 3b | 3c | 3d | 4a | 4b | 5a | 5b | 6a | 6b | # ODD | |
|
RULE | <6 | =1 | =4 | <5 | =0 | =0 | =1 | <2 | =1 | >4 | >3 | | |
|
Q1 | -28 | 2 | * | 1 | 0 | 0 | 1 | 0 | 1 | 3 | 3 | 4 (but ~1 if fair) | CDC 6600 |
|
That is:
- 2: it has 2 instruction sizes (not 1), 15 & 30 bits
(however, were packed into 60-bit words, so if you had 15,
30, 30, the second 30-bitter would not cross word
boundaries, but would start in the second word.)
- *: 15-and-30 bit instructions, not 32-bit.
- 1: 1 addressing mode [Note: Time McCaffrey emailed me that
one might consider there to be more, i.e., you could set
address register to combinations of the others to give
autoincrement/decrement/Index+offset, etc). In any case,
you compute an address as a simple combination of 1-2
registers, and then use the address, without furhter side-effects.
- 0: no indirect addressing
- 1: have one memory operand per instruction
- 0: do NOT support arbitrary alignment of operands in
memory (well, it was a word-addressed machine :-)
- 1: use an MMU for data translation no more than once per
instruction (MMU used loosely here)
- 3,3: had 3-bit fields for addressing registers, both index
and FP
Now, of the 10 ISA attributes I'd proposed for identifying
typical RISCs, the CDC 6600 obeys 6. It varies in having 2
instruction formats, and in having only 3 bits for register fields,
but it had simple packing of the instructions in to fixed-size words,
and register/accumulators were pretty expensive in those days (some
popular machines only had one accumulator and a few index registers,
so 8 of each was a lot). Put another way: it had about as many
registers as you'd conveniently build in a high-speed machine, and
while they packed 2-4 operations into a 60-bit word, the decode was
pretty straighforward. Anyway, given the caveats, I'd claim that the
6600 would fit much better in the RISC part of the original table...
PART IV - RISC, VLIW, STACKS
Article: 43173 of comp.arch
Newsgroups: comp.sys.amiga.advocacy,comp.arch
From: mash@mash.engr.sgi.com (John R. Mashey)
Subject: Re: PG: RISC vs. CISC was: Re: MARC N. BARR
Date: Thu, 15 Sep 94 18:33:14 PDT
In article <35a1a3$mlb@doc.armltd.co.uk>, Clive.Jones@armltd.co.uk
writes:
Really? The Venerable John Mashey's table appears to contain as
many exceptions to the rule about number of GP registers as most
others. I'm sure if one were to look at the various less conventional
processors, there would be some clearly RISC processors that didn't
have a load-store architecture - stack and VLIW processors spring to
mind.
I'm not sure I understand the point. One can believe any of
several things:
- One can believe RISC is some marketing term without technical
meaning whatsoever. OR
- One can believe that RISC is some collection of implementation
ideas. This is the most common confusion.
- One can believe that RISC has some ISA meaning (such as RISC ==
small number of opcodes)... but have a different idea of RISC
than do most chip architects who build them. If you want to pay
words extra money every Friday to mean something different than
what they mean to practitioners... then you are free to do so,
but you will have difficulty communicating with practitioners
if you do so.
EX: I'm not sure how stack architectures are "clearly RISC" (?)
Maybe CRISP, sort of. Burroughs B5000 or Tandem's original
ISA: if those are defined as RISC, the term has been rendered
meaningless.
EX: VLIWs: I don't know any reason why I'd call VLIWs, in
general, either clearly RISC or clearly not. VLIW is a technique
for issuing instructions to more functional units than you
have the die space/cycle time to decode more dynamically.
There gets to be a fuzzy line between:
- A VLIW, especially if it compresses instructions in
memory, then expands them out when brought into the cache.
- A superscalar RISC, which does some predecoding on the
way from memory->cache, adding "hint" bits or rearranging
what it keeps there, speeding up cache->decode->issue.
At least some VLIWs are load/store architectures, and
the operations they do look usually look like typical RISC
operations. OR, you can believe that:
- RISC is a term used to characterize a class of
relatively-similar ISAs mostly developed in the 1980s.
Thus, if a knowledgable person looks at ISAs, they will
tend to cluster various ISAs as:
- Obvious RISC, fits the typical rules with few exceptions.
- Obviously not-RISC, fits the inverse of the RISC
rules with relatively few exceptions. Sometimes
people call this CISC... but whereas RISCs, as a group,
have relaitvely similar ISAs, the CISC label is sometimes
applied to a widely varying set of ISAs.
- Hybrid / in-the-middle cases, that either look like
CISCy RISCs, or RISCy CISCs. There are a few of these.
Cases 1-3 are appropriate may apply to reasonably contemporaneous
processors, and make some sense. and then 4)
- CPUs for which RISC/CISC is probably not a very relevant
classification. I.e., one can apply the set of rules
I've suggested, and get an exception-count, but it may
not mean much in practice, especially when applied to
older CPUs created with vastly different constraints than
current ones, or embedded processors, or specialized ones.
Sometimes an older CPU might have been designed with
some similar philosophies (i.e., like CDC 6600 & RISC,
sort of) whether or not it happend to fit the rules.
Sometimes, die-space constraints my have led to "simple"
chips, without making them fit the suggested criteria either.
Personally, torturous arguments about whether a 6502, or
a PDP-8, or a 360/44 or an XDS Sigma 7, etc, are RISC
or CISC... do not usually lead to great insight.
After a while such arguments are counting angels
dancing on pinheads ("Ahh, only 10 angles, must be RISC" :-).
In this belief space, one tends to follow Hennessy & Patterson's
comment in E.9 that "In the history of computing, there has never been
such widespread agreement on computer architecture." None of this
pejorative of earlier architectures, just the observation that the
ISAs newly-developed in the 1980s were far more similar that the
earlier groups of ISAs. [I recall a 2-year period in which I used IBM
1401, IBM 7074, IBM 7090, Univac 1108, and S/360, of which only the
7090 and 1108 bore even the remotest resemblance to each other, i.e.,
at least they both had 36-bit words.]
SUMMARY
RISC is a label most commonly used for a set of ISA characteristics
chosen to ease the use of aggressive implementation techniques found
in high-performance processors (regardless of RISC, CISC, or
irrelevant). This is a convenient shorthand, but that's all, although
it probably makes sense to use the term the way it's usually meant by
people who do chips for a living.
-john mashey DISCLAIMER: < generic disclaimer, I speak for me only, etc>
UUCP: mash@sgi.com
DDD: 415-390-3090 FAX: 415-967-8496
USPS: Silicon Graphics 6L-005, 2011 N. Shoreline Blvd, Mountain View, CA 94039-7311
--
-john mashey DISCLAIMER: < generic disclaimer: I speak for me only...>
EMAIL: mash@sgi.com DDD: 415-933-3090 FAX: 415-967-8496
USPS: Silicon Graphics/Cray Research 6L-005,
2011 N. Shoreline Blvd, Mountain View, CA 94043-1389