Electronical and Electromechanical Explorations

This blog organizes and presents some of my various projects and musings related to taming wild electrons and putting them to work. Projects are listed down the right side of the page.

Friday, January 9, 2015

Oscilloscope: Architecture

As promised, here is the a sketch of an architecture for a digital sampling oscilloscope.  I want to play with this architecture with a view towards building one for my lab, primarily as a pleasant hobby / learning experience.  I'm not sure that its appropriate for all levels of oscilloscope implementation -- the traditional DSO architecture model is pretty good for low-capability hardware, and I may end up using that for the oscilloscope watch project I'm noodling around with.  And, in practice I am sure it will get tweaked either for performance optimizations, to take advantage of hardware resources, or to support certain features.  And finally, this architecture only discusses the basic processing flow of a single channel and doesn't include any info about auxiliary features -- which aren't really relevant to the central model.

This design sketch is mostly high-level but contains a few bits of detail that I thought were interesting and relevant... not very rigorous, but eh...

One last thing:  This architecture is "soft" in nature -- implemented either as software running on microcontrollers and/or larger processors, as configurations in FPGA chips, or on a GPU.  Which parts go where is TBD and may depend on the requirements of a particular implementation.  The analog front end is of course very important and interesting, but is beyond the scope of this discussion.

Basic points of the architecture:
  • Separate processing into three parallel components:  sampling, triggering, and display
  • Samples are stored in a very large hierarchical circular memory
  • All triggering occurs by processing sampled data (rather than in analog circuitry)
  • Display functions reside in highly parallel line/polygon-drawing hardware

Sampling

As a first pass, sampling always occurs at a single (maximum) rate.  In practice that might be modified in unusual cases requiring very long data sets such as multiple seconds per div or persistence times longer than 5-10 seconds or so -- but for now consider the sampling rate to be fixed.  Two reasons for this:
Retaining full-rate data means maximum zooming is always available
Memory is (relatively) cheap, at least in the quantities needed.  16GB of DDR3 SDRAM costs about $100 so memory cost per se is not a major consideration.

Regarding SDRAM, DDR3 supports a bandwidth of more than 10 GB/sec and DDR4 is significantly higher, so a single memory is probably sufficient, but a "striped" array of memories could be used if needed.  I don't have the skills to construct circuitry to process multiple-GSPS systems anyway, but I am confident this memory-intensive architecture could scale if needed.

Sample memory is stored in a hierarchical structure (see figure).  The top level contains the raw samples.  Subsequent layers correspond precisely to user-selectable timebases to avoid scanning large amounts of memory to create low-time-resolution waveform displays.  The downsampled buffers contain more than single values -- probably <min, max, mean> from the corresponding data higher in the hierarchy.  Using this extra information allows rich detailed displays that avoid aliasing errors.

The sample hierarchy is created as the samples come in, probably in an FPGA front end using parallel resamplers if necessary.

Triggers

Triggers are stored as a list of pointers computed by (probably parrallel banks of)  data scanners -- initially as data is sampled and subsequently if trigger criteria are changed.  During trigger recomputation the hierarchy can be traversed bottom-up to focus on potential trigger points.

Computing (for example) a persistent display involves traversing the relevant triggers in the list and accessing the appropriate corresponding sample memory.

Triggers can be arbitrarily complex and since they are largely independent can be computed in parallel.  For real-time display implementing them with an FPGA probably makes the most sense, but when operating on a snapshot they can use more complex algorithms (at a cost in time).  For example, an "anomalous waveform" trigger could gather statistics about every waveform in the time frame of interest, then compute some distance metric between an average and each waveform in turn.  Similarly, algorithmically detecting "runt pulses" could create a trigger list of all runts in the sample memory, after which a persistent display could show all of them at once to get a view of their characteristics.  The huge sample memory makes this type of detailed post-analysis possible.

Summary features of the sample/trigger representation:
  • Full data is always available (zooming in to captured data always provides maximum detail)
  • Trigger points quickly accessible and easily recomputable for different views on the data
  • Since all data is captured continuously, there is zero dead time
  • Information required for optimum display at any timebase is immediately available; this information avoids aliasing errors in display
 Display

A simple snapshot display of captured data is obviously trivial:  just read the data from the appropriate level of the hierarchy and display it. Of more interest, though, are displays that combine the data from multiple triggers.  The analog-inspired "persistent" display is one such; others might include displaying an "average" waveform (itself perhaps superimposed over individual waves), displaying "variation bands" (min/max) of all triggers, etc.

With the large sample buffer, operating on snapshot data instead of real-time is just fine for many analysis use-cases. 

In addition to the flexibility of display, aggregate-display modes such as persistence will be available if desired while scrolling and zooming through the sample buffer, though probably with some degree of lag depending on the number of triggers to be displayed.

Displaying a Million Waveforms per Second

Creating a display in real-time combining a million triggered waveforms seems like a daunting task, but it is pretty easy to throw hardware at the problem.  Slightly oversimplified: if a waveform consists of a sequence of N sampled data points  <t, value>, it is trivial to break this up into n-1 line segments connecting the sequential data points.  For each one:
  1. Convert the <t, value> endpoints into screen coordinates <x, y>
  2. Draw a line segment between the two points
 If we choose the algorithms with a bit of care, the drawing processes for each segment of each waveform are completely independent of each other, which means they can be run in parallel... and that's a LOT of parallelism!  There are two basic ways to parallelize this.  The easiest one is to just distribute them to a set of processing elements without caring where the line segments go.  If each of the line-drawers has equal access to the entire display buffer, that should work fine.  For some computing hardware, that kind of equal access might not be possible though.  For example, if using an FPGA to do this, we might be able to store the display memory in scattered bits of "block ram" inside the chip itself, which makes access really fast -- but we'd only want to allow certain nearby computation units to have access to the block ram for a particular screen region.  In this case, it would be better to have dispatchers sending individual segments to the appropriate "processors" (possibly breaking up the segment into smaller pieces first).  That is an implementation detail, though... the point is that it will be pretty easy to display enormous quantities of data using massively parallel line-drawers.  Design options include FPGA fabric and conventional PC graphics cards (among other choices).

One important wrinkle worth mentioning:  As exactly stated above, the drawing uses linear interpolation.  In practice, it will be desirable instead to use sin(x)/x interpolation instead, especially at maximal zoom levels.  From what I can tell so far, this is mainly a filtering problem and breaks up into parallel tasks just as easily as the line drawing itself, so it shouldn't be a major issue.

So that's the basic architecture! 

Next I want to look into whether I can use a version of this architecture for my goofy oscilloscope watch (I suspect not).  Otherwise, it's time to start planning the design of a "real" oscilloscope for use on my bench!






No comments:

Post a Comment