Market Overview: Memories are Made of This

Chris Long takes a trip down memory lane to investigate the rampant regeneration of the Ram chip

When people that know about computers get together in the pub to discuss the old days, all sorts of things suggest themselves for the agenda. But before too long the conversation will invariably get around to how much things have improved. The invincible march of technology will be discussed at great length at the same time as each person tries to avoid buying the next drink.

The subjects are obvious: processors, and how the 8086 had 29,000 transistors compared with the 5.5 million of the Pentium Pro; hard disks, from 10Mb on the PC XT to 2Gb 14 years later; and how computer types still have no social life to speak of. And while this is all true, one of the major technologies that has almost changed beyond recognition goes unmentioned.

How quickly everyone forgets ? which is ironic, because it is memory.

Not so long ago if you dropped 64Kb of memory on your foot, likely as not you?d lose it ? not the memory, the foot. This was in the good old days when memory was literally laced or knitted together, tiny ferrous rings stitched together with fine wire and the whole thing soldered on a special printed circuit board (PCB). A number of these PCBs would be connected in parallel, each representing a bit, thus it would be eight PCBs for a byte.

We seem to have forgotten that those small beetle-size chips inside our computers used to be hundreds of times bigger ? and slower. Now we are talking of getting 256Mbits on one chip, eight of which take up the space of a packet of chewing gum. Now that?s progress. And maybe the biggest surprise is, despite the great leaps forward, these chips work in almost exactly the same way as their previous incarnations.

The best way to think of memory is as a wall filled with mailbox pigeonholes. They are numbered across the top and down the side. Selecting a pigeonhole is a simple task of specifying its row and column address ? you now have whatever is in that pigeonhole. This is how memory works, either within the chip or on the laced PCB, a location is the combination of two addresses and the result of the search depends on whether there is a one or a nought at the location.

Timing is important here because there are lots of things going on. First, there?s what is going on in the rest of the computer. This is defined by the clock time of the processor, which has to be in step with the speed of the motherboard and the bus. Second, there is the perfectly healthy but irritating tendency of the memory chip to leak away its electrical charge. When this happens the information stored in it goes away with it, so there is a sys-tem of refresh cycles where the chip is regularly recharged and the information stored in it stays there.

It doesn?t take a rocket scientist to work out that this refresh cycle has to happen at a different time from any access the processor makes. This is complicated by the delay or latency of the memory system itself, which is defined as the time it takes to receive a request for the information and for that information to appear on the memory?s output bus. It is these timings and how they work together that are the reason for the nanosecond timings on memory chips.

In essence, getting data out of a Ram chip involves putting a signal down the column address select (CAS) line and another on the row address select (RAS) line and then waiting for a prescribed time for the data to be appear and then reading it into the output buffer.

The timings are needed because the output side of the memory has to know when to stop waiting for data to appear (the bits may appear within a couple of nanoseconds of each other). This way it can take the data on the memory output and write it to where the processor wants it without worrying that it has left a bit or two behind. That?s the principle, but the practice is slightly different. It?s called fast page mode (FPM).

Just as before, a memory read access begins with the activation of a row in the Ram array, followed by the column. Once the correct piece of information is found, the column deactivates and gets ready for the next cycle. This introduces a wait state, because nothing is happening while the column is deactivating ? the CPU must wait for the memory to complete its cycle.

In FPM, the next column in the row now activates in anticipation of the fact that the next piece of data required might be in the memory location next to the previous column. This activation of the next column only works well with sequential reads from memory in a given row. This is an improvement, but is still only hit and miss.

Enter extended data out (EDO) Ram. EDO was introduced into mainstream PCs about 18 months ago and has since become the memory of choice for most system vendors. EDO works much like FPM: a row of memory is activated and then the column is activated. But when the piece of information is found, instead of deactivating the column and turning off the output buffer (like FPM), EDO memory keeps the output data buffer on until the next column access or next read cycle begins. By keeping the buffer on, EDO eliminates a wait state. The good news is EDO is easy to implement and costs virtually the same as FPM.

An extension of EDO technology is waiting to be taken up. Called Burst EDO (BEDO) it is even faster than EDO. Since most PC applications access memory in four-cycle bursts, once the first address is known the next three, so the logic goes, can quickly be provided by the DRam. Thus the main difference is a 2bit address counter and a few extra bits of logic so it can work out the forthcoming addresses ? making it faster but a bit more expensive than EDO Ram.

Of course in an ideal world these delays would be less than the cycle time of the processor so the data would be there every time the processor came back to it. Alas it isn?t possible for standard Ram chips: a Pentium running at 133MHz has a cycle time of about 7.5ns, so connecting it to 60ns Ram chips would mean it would have to wait several clock cycles every time it accessed memory.

This was quickly spotted as processors got faster, and in the late 1980s Intel produced a second level cache controller chip for the 386. This was a small amount of static Ram (SRam) that held data the processor had already got from Ram and was likely to ask for again. SRam has the advantage of not needing a refresh cycle and thus is a lot faster ? 15ns or 12ns. Nowadays it is referred to as L2 cache and is standard issue on just about all processors.

This approach was extended in the 486 with blocks of cache Ram put on the processor chip itself ? called first level cache or primary cache. The primary level cache actually caches the second level cache, speeding up the transfer of data to the CPU even more.

The simplest form of SRam uses an asynchronous design, in which the CPU sends an address to the cache, the cache looks up the address, then returns the data using about three or four processor cycles.

Synchronous cache buffers incoming addresses to spread the address lookup routine over two or more clock cycles. SRam stores the requested address in a register during the first clock cycle. During the second, it retrieves the data and delivers it. Since the address is stored in the register, synchronous SRam can then receive the next data address internally while the CPU is reading the data from the previous request.

Synchronous SRam can then ?burst? subsequent data elements without receiving and decoding additional addresses from the chipset and response time is reduced to 8.5 to 12ns.

Another type of synchronous SRam is called pipelined burst. In essence, pipelining adds an output stage that buffers data reads from the memory locations so that subsequent memory reads are accessed quickly. Pipelining gives address/data times of 4.5 to 8ns.

That is the current state of play, but things are beginning to change again. Intel has flagged synchronous DRam (SDRam) as the way forward. Until now, dynamic Ram access has been asynchronous. In essence, data transfers are controlled by exchanging signals between the Ram chips and the processor.

The core of an SDRam device is a standard Ram chip with the addition of synchronous control logic. The idea is to synchronise all address, data and control signals with a single system clock, thus simplifying design and giving faster data transfer.

Despite some resemblances to Ram, the SDRam architecture has many improvements. For instance, each SDRam chip has two separate banks of memory cells which can be interleaved, allowing it to have two different rows active at the same time so that an access from one bank can be handled while the other is preparing for the next access.

Not content with juggling technologies at the chip level, the industry is also playing with the way they plug into the motherboard. Over the past few years, the 72-pin Simm has been slowly taking over from the 30-pin Simm. Its development was driven by the arrival of 32-bit processors from Intel and Motorola. While a CPU would require four 30-pin (8-bit wide) Simms in each bank to get its 32 data bits, just one 72-pin Simm bank can be used to provide the CPU with 32 bits.

Even that is changing; a year ago, most machines used 72-pin Simms, but today resellers are faced with 144-pin and 168-pin dual inline memory modules (Dimms). The benefit of a Dimm is its 64-bit wide data path, so only one is needed in a Pentium board, which would normally require two 72-pin, 32-bit Simms. Dimms themselves aren?t new ? they?ve been used in Macs for years.

Even the pins themselves have become more complex. It didn?t used to matter if there were gold or tin leads on the modules, but now with some motherboards you have to have gold leads or the module won?t work. Well, without this kind of inconsistency, where would all the fun be in the industry?

As for the future, it doesn?t stop here: NEC recently revealed plans for a 4Gbit chip and Fujitsu showed a 256Mbit chip that can serve clock speeds faster than 100MHz and increase system operation to the gigabit per second area.

But as if to underline the movement of the industry, there is now a growing number of competing technologies vying to replace the newest ?only just established? technologies.

Experts are concentrating on the memory interface that moves data from the memory core and into the system pipeline. Up and coming solutions include: double data rate (DDR) SDRam (also known as SDRam-II), Synclink DRam and Rambus DRam (RDRam).

Advocates of DDR SDRam are banking on the fact that because it is a logical extension of existing SDRam memory, its adoption would help maintain technological stability for the next three to five years. Seemingly on the outside, the Synclink DRam (SLDRam) camp is pushing for a more substantial change to standard chips.

Of them all, only the Rambus solution is really available today and is being backed by Intel. Its supporters plan to design the next generation of Rambus dynamic memory, NDRam, which could reach a transfer rate of 1.6Gb a second. Word is that it should be fully adopted by 1999 ? if the others don?t prove their worth first.

So next time you are up the pub and are at a loss for something to say, start a conversation about memory ? it might be interesting. If you can remember it, that is.