Interesting interview with the father of Niagara 1 architecture, Kunle Olukotun in theregister. It is about 60min long, and below are my notes when listening to the interview. It goes a bit into life and career of Kunle himself. 


Main pointers from the interview:

  • doing parallel programming research with the example of 3D worlds is a good idea — 3D worlds will be, according to Kunle, the next version of the web (although SecondLife is laughable example, not to mention OpenSim which runs even slower and lacks features)
  • the holy grail of MC technology hidden in Common Parallel Runtime System that magically solves all problems by dynamic and static program optimisation for multi-core chips
  • abstracting way above CUDA and current GPGPU technologies is essential for mainstream pick up of the multicore technologies
  • the technology must be architecture independent, so the source code of the programs can be re-mapped to a different parallel architecture once it becomes available
  • providing domain specific languages that can be used to exploit parallelism in automated fashion
  • current OS will have to disappear once we reach 100+ cores (2009, with the Rock architecture!)
  • cloud-computing: inevitable, and more and more pervasive. 

Some unedited notes I took from the audio interview…  If you go through the hassle of actually listening to the interview.


Kunle Olukotun, born in London, in mid 60thies, to a Nigerian family. Spent some time back home as teenager. Finished PhD in 1991 on multiprocessors and timing analysis tools, applied for job in Stanford and worked there since. Speaks really fast and with ease about all the technical details the host seem not to have clue about (side questions never follow what’s been discussed).

Ideas of MC did not catch up initially, but Kunle thought MC really well suited for multi-threaded apps. Thread-level speculation for parallelising single-threaded apps. Hydra research goes back to 1999. Initially the MC for embedded systems. Later, after talking with google/yahoo lots of concerns about power and heat in data centers. Using slower clocks, but moving towards more MC designs. Talking about Stanford entrepreneurial culture.  New processor for server market seemed good idea. Sequoia Capital. Les Kohn comes on board. Starting group of 5 engineers, grew up eventually to 80. Les was CTO for Afara. The original Niagara 1 was the complete new design done by Afara. Afara started in 2000, acquired by Sun at the end 2002. 90nm T1 done later. Initial plans was to ship the boxes (not to be acquired). A bit less than $100M deal with Sun. Now over $1B/year business for Sun now. Kunle confess to be academic at heart. Keen to go after new ideas, new projects. Programming is a top priority now – how to program the MC chips. Not much parallelism on workstation – different business objectives for Intel/AMD. Not many players going as aggressively towards the MC market as Sun. IBM not giving up on power – mainframes tend to be less threaded (perhaps). Power6? Overkill? Odd man? IBM VP: customers need more throughput not necessary lower latency (odd to what 2cores at 5GHz offer). Intel Itanium mentioned as failure. The well fit to database requirements. More strands coupled by caches. Improving performance per Watt. Rock has been delayed till 2009. The next big thing? How to program the new chips – software. Desktop apps are really difficult. Parallel programming a common case. “I do not like parallel programming even though I teach that stuff”. Kunle things the future is in programmers programming sequentially as it is now. The parallel programming development must be seamless for average programmer.

“Virtual World – next generation web”, AI, graphics rendering, physics, etc – all highly parallel, lots of interests in various CS topics.

Other good applications for HPC multicores: autonomous vehicles, various forms of AI algorithms, sensing, etc.

Huge dataset analysis.

Making fun of SecondLife performance ;o)

Idea: domain-specific languages – customizing the language the programmer uses that matches what is needed within a given domain, so that the implicit parallelism can be then exploited in automated fashion.

GPGPU, CUDA – is it enough domain specific?   Too low level – the GPU-level is fine, and CUDA is fine too, but it is still too low level for an average programmer – we are after high-level programming languages that make it easy for the programmers. The details of the architectures should be irrelevant. Threading, memory management etc – this all should be happening behind the scenes. The env will have a number of domain-specific languages DSL plus tools to create new ones.  The holy grail: *common parallel runtime system* (!!)  Used for both, static and dynamic program optimisations.  Dynamic compiler, feedback from the architecture (speculative execution?) Goal – 3 years funding for the lab with DSL environment, etc. (talking about his lab). Working on new architectures too. (for the feedback?  coupling software and hardware systems?)  Heterogeneous MC are inevitable. Coupling GPUs with single fast core and high number of low powered strands for high throughput, etc.  (wow, the host is so annoying!). The industry is not as forward looking as what is happening at the research labs and academia. CUDA is a low level parallel programming. What the lab is after is a level above that. Overtime the underlying architecture is going to change – therefore the code should be architecture independent. The more abstract the program is, the easier it gets to optimise it.

What about RapidMind, PeakStream – taken up by Google? Kunle not familiar with RapidMind – set of libraries available for the programmer. A bit vague answer.

The pervasive parallelism lab – The key is how to make the job of the programmer easier. Absolute performance not the key. 8 core not really needed for desktop code? Adding more cores the operating system must fundamentally change. Traditional OS view – designed for scarce resources. The old model does not apply anymore. The new model is to take the OS out of the way – let the apps becoming the basic concept – let the apps run directly on the hardware.   With 100 of cores let the apps just run on bare metal. Microsoft vs Solaris vs Linux capabilities in handling multiple cores – LOLs.

The parallel lab is open, anyone can join, all the software is open, all IP is open.  Talking about labs competition etc. Giggling about Microsoft agreeing on BSD licence under Berkeley lab agreement.

Future – using massive parallelism for AI algorithms. Relationship to Jeff Hawkins on intelligence ( and numenta technology? Hierarchical Temporal Memory (HTM)?

autonomous vehicle racing on the desert – impressive – ML the key for Stanford and Carnegie Mellon teams.

Can we program the computers by letting them to learn how to solve a given defined problem? The programming would end up being specifying what needs to be discovered or solved.

Hyperscale data centers, cloud computing?

Traditional clusters: problems with power and cooling  are more than the costs of the machines.  On demand computing is much more viable alternative.

Pendulum swing what’s on the workstation and what’s on the datacentre. Big benefits of having everything on demand, in the cloud. Mobile devices become much more capable – high-definition interface.

Lab participation – anyone can reach Kunle ($100k for companies) – no individuals – all processes are company-based.

What 100k will get you – interactions with the members of the team, early release of the software, and invitation to 2-3 research retreats a year.



Leave a Reply

Your email address will not be published. Required fields are marked *