Using Qnet for Transparent Distributed Processing

This chapter includes:

What is Qnet?
When should you use Qnet?
Conventions for naming nodes
Software components for Qnet networking
Starting Qnet
Checking out the neighborhood
Troubleshooting

What is Qnet?

A Neutrino native network is a group of interconnected workstations running only Neutrino. In this network, a program can transparently access any resource -- whether it's a file, a device, or a process -- on any other node (a computer or a workstation) in your local subnetwork. You can even run programs on other nodes.

The Qnet protocol provides transparent networking across a Neutrino network; Qnet implements a local area network that's optimized to provide a fast, seamless interface between Neutrino workstations, whatever the type of hardware.

For QNX 4, the protocol used for native networking is called FLEET; it isn't compatible with Neutrino's Qnet.

In essence, the Qnet protocol extends interprocess communication (IPC) transparently over a network of microkernels -- taking advantage of Neutrino's message-passing paradigm to implement native networking.

When you run Qnet, entries for all the nodes in your local subnetwork that are running Qnet appear in the /net namespace. (Under QNX 4, you use a double slash followed by a node number to refer to another node.)

For more details, see the Native Networking (Qnet) chapter of the System Architecture guide. For information about programming with Qnet, see the Transparent Distributed Networking via Qnet chapter of the Programmer's Guide.

When should you use Qnet?

When should you use Qnet, and when TCP/IP or some other protocol? It all depends on what machines you need to connect.

Qnet is intended for a network of trusted machines that are all running Neutrino and that all use the same endianness. It lets these machines share all their resources with little overhead. Using Qnet, you can use the Neutrino utilities (cp, mv, and so on) to manipulate files anywhere on the Qnet network as if they were on your machine.

Because it's meant for a group of trusted machines (such as you'd find in an embedded system), Qnet doesn't do any authentication of requests. Files are protected by the normal permissions that apply to users and groups (see "File ownership and permissions" in Working with Files), although you can use Qnet's maproot and mapany options to control -- in a limited way -- what others users can do on your machine. Qnet isn't connectionless like NFS; network errors are reported back to the client process.

TCP/IP is intended for more loosely connected machines that can run different operating systems. TCP/IP does authentication to control access to a machine; it's useful for connecting machines that you don't necessarily trust. It's used as the base for specialized protocols such as FTP and Telnet, and can provide high throughput for data streaming. For more information, see the TCP/IP Networking chapter in this guide.

NFS was designed for filesystem operations between all hosts, all endians, and is widely supported. It's a connectionless protocol; the server can shut down and be restarted, and the client resumes automatically. It also uses authentication and controls directory access. For more information, see "NFS filesystem" in Working with Filesystems.

Conventions for naming nodes

In order to resolve node names, the Qnet protocol follows certain conventions:

node name

A character string that identifies the node you're talking to. This name must be unique in the domain and can't contain slashes or periods.

The default node name is the value of the _CS_HOSTNAME configuration string. If your hostname is localhost (the default when you first boot), Qnet uses a hostname based on your NIC hardware's MAC address, so that nodes can still communicate.

node domain

A character string that npm-qnet.so adds to the end of the node name. Together, the node name and node domain must form a string that's unique for all nodes that are talking to each other. The default is the value of the _CS_DOMAIN configuration string.

fully qualified node name (FQNN)

The string formed by concatenating the node name and node domain. For example, if the node name is karl and the node domain name is qnx.com, the resulting FQNN is karl.qnx.com.

network directory

A directory in the pathname space implemented by npm-qnet.so. Each network directory -- there can be more than one on a node -- has an associated node domain. The default is /net, as used in the examples in this chapter.

The entries in /net for nodes in the same domain as your machine don't include the domain name. For example, if your machine is in the qnx.com domain, the entry for karl is /net/karl; if you're in a different domain, the entry is /net/karl.qnx.com.

name resolution

The process by which npm-qnet.so converts an FQNN to a list of destination addresses that the transport layer knows how to get to.

name resolver

A piece of code that implements one method of converting an FQNN to a list of destination addresses. Each network directory has a list of name resolvers that are applied in turn to attempt to resolve the FQNN. The default is the Network Discovery Protocol, ndp. See the Utilities Reference for more information.

Software components for Qnet networking

You need the following software entities (along with the hardware) for Qnet networking:

Qnet framework

Components of Qnet.

io-net

Manager to provide support for dynamically loaded networking modules.

Network drivers (devn-*)

Managers that form an interface with the hardware.

npm-qnet.so

Native network manager to implement Qnet protocols. Neutrino currently includes these versions:

npm-qnet-compat.so -- the original stack.
npm-qnet-l4_lite.so -- the new, lightweight version, which provides faster speed and enhanced reliability. This version of the Qnet stack isn't compatible with the earlier version with regard to packet and protocol format.

By default, npm-qnet.so is a symbolic link to the latest version of the Qnet protocol stack. To determine which version you're using, type:

ls -l /lib/dll/npm-qnet.so

If any conflict arises, see "Troubleshooting," later in this chapter.

Starting Qnet

You can start Qnet by:

creating a useqnet file, then rebooting
or:
explicitly starting the network manager, protocols, and drivers.

If you run Qnet, anyone else on your network who's running Qnet can examine your files and processes, if the permissions on them allow it. For more information, see:

"File ownership and permissions" in the Working with Files chapter in this guide
"Qnet" in the Securing Your System chapter in this guide
"Autodiscovery vs static" in the Transparent Distributed Processing Using Qnet chapter of the Neutrino Programmer's Guide.

Creating `useqnet`

To start Qnet automatically when you boot your system, log in as root and create an empty useqnet, like this:

touch /etc/system/config/useqnet

For more information about what happens when you boot your system, see Controlling How Neutrino Starts.

Starting the network manager, protocols, and drivers

The io-net manager is a process that assumes the central role to load a number of shared objects. It provides the framework for the entire protocol stack and lets data pass between modules. In the case of native networking, the shared objects are npm-qnet.so and networking drivers, such as devn-ppc800-ads.so. The shared objects are arranged in a hierarchy, with the end user on the top, and hardware on the bottom.

You can start the io-net from the command line, telling it which drivers and protocols to load:

$ io-net -del900  -p npm-qnet  &

This causes io-net to load the devn-el900.so Ethernet driver and the Qnet protocol stack.

Or, you can use the mount and umount commands to start and stop modules dynamically, like this:

$ io-net &
$ mount -Tio-net devn-el900.so
$ mount -Tio-net npm-qnet.so

To unload the driver, type:

umount /dev/io-net/en0

You can't unmount a protocol stack such as TCP/IP or Qnet.

Checking out the neighborhood

Once you've started Qnet, the /net directory includes an entry for all other nodes on your local subnetwork that are running Qnet. You can access files and processes on other machines as if they were on your own computer (at least as far as the permissions allow).

For example, to display the contents of a file on another machine, you can use less, specifying the path through /net:

less /net/alonzo/etc/TIMEZONE

To get system information about all of the remote nodes that are listed in /net, use pidin with the net argument:

$ pidin net

You can use pidin with the -n option to get information about the processes on another machine:

pidin -n alonzo | less

You can even run a process on another machine, using your console for input and output, by using the -f option to the on command:

on -f alonzo date

Troubleshooting

All the software components for the Qnet network should work in unison with the hardware to build a native network. If your Qnet network isn't working, you can use various Qnet utilities to fetch diagnostic information to troubleshoot your hardware as well as the network. Some of the typical questions are:

Is Qnet running?
Are io-net and the drivers running?
Is the Qnet protocol stack or Ethernet driver installed?
Is the network card functional?
How do I get diagnostic information?
Is the Qnet version correct?
Is the hostname unique?
Are the nodes in the same domain?

Is Qnet running?

Qnet creates the /net directory. Use the following command to make sure that it exists:

$ ls /net

If you don't see any directory, Qnet isn't running. Ideally, the directory should include at least an entry with the name of your machine (i.e. the output of the hostname command). If you're using the Ethernet binding, all other reachable machines are also displayed. For example:

joseph/ eileen/

Are `io-net` and the drivers running?

As mentioned before, io-net is the framework used to connect drivers and protocols. In order to troubleshoot this, use the following pidin command:

$ pidin -P io-net mem

Look for the Qnet shared object in the output:

pid tid name               prio STATE           code  data    stack

86034   1 sbin/io-net    10o SIGWAITINFO       56K  684K  8192(516K)*
86034   2 sbin/io-net    10o RECEIVE           56K  684K   4096(68K)
86034   3 sbin/io-net    10o RECEIVE           56K  684K   4096(68K)
86034   4 sbin/io-net    10o RECEIVE           56K  684K   4096(68K)
86034   5 sbin/io-net    20o RECEIVE           56K  684K  4096(132K)
86034   6 sbin/io-net    10o RECEIVE           56K  684K   4096(68K)
86034   7 sbin/io-net    21r RECEIVE           56K  684K  4096(132K)
86034   8 sbin/io-net    10r RECEIVE           56K  684K  4096(132K)
86034   9 sbin/io-net    10o RECEIVE           56K  684K  4096(132K)
86034  10 sbin/io-net    10o RECEIVE           56K  684K  4096(132K)

          ldqnx.so.2         @b0300000             312K   16K
          npm-tcpip.so       @b8200000             592K  144K
          devn-el900.so      @b82b8000              56K  4096
          devn-epic.so       @b82c7000              44K  4096
      npm-qnet-l4_lite.so    @b82d3000             132K   16K

If the output includes an npm-qnet shared object, Qnet is running.

Is the Qnet protocol stack or Ethernet driver installed?

In order to ascertain the above, use the following command:

$ ls /dev/io-net

Ideally, you should see the following output:

en0        ip0        ip_en      ipv6_en    qnet_en

The en0 entry represents the first (and only) Ethernet driver, and qnet_en represents the Qnet protocol stack.

Is the network card functional?

To determine whether or not the network card is functional, i.e. transmitting and receiving packets, use the nicinfo command. If you're logged in as root, your PATH includes the directory that contains the nicinfo executable; if you're logged in as another user, you have to specify the full path:

$ /usr/sbin/nicinfo

Now figure out the diagnostic information from the following output:

3COM (90xC) 10BASE-T/100BASE-TX Ethernet Controller
  Physical Node ID ................. 000103 E8433F
  Current Physical Node ID ......... 000103 E8433F
  Media Rate ....................... 10.00 Mb/s half-duplex UTP
  MTU .............................. 1514
  Lan .............................. 0
  I/O Port Range ................... 0xA800 -> 0xA87F
  Hardware Interrupt ............... 0x7
  Promiscuous ...................... Disabled
  Multicast ........................ Enabled

  Total Packets Txd OK ............. 1283237
  Total Packets Txd Bad ............ 9
  Total Packets Rxd OK ............. 7923747
  Total Rx Errors .................. 0

  Total Bytes Txd .................. 82284687
  Total Bytes Rxd .................. 1612645356

  Tx Collision Errors .............. 34380
  Tx Collisions Errors (aborted) ... 0
  Carrier Sense Lost on Tx ......... 0
  FIFO Underruns During Tx ......... 0
  Tx deferred ...................... 83301
  Out of Window Collisions ......... 0
  FIFO Overruns During Rx .......... 0
  Alignment errors ................. 0
  CRC errors ....................... 0

You should take special note of the Total Packets Txd OK and Total Packets Rxd OK counters. If they're zero, the driver might not be working, or the network might not be connected. Verify that the Media Rate has been auto-detected correctly by the driver.

How do I get diagnostic information?

You can find diagnostic information in /proc/qnetstats. If this file doesn't exist, Qnet isn't running.

The qnetstats file contains a lot of diagnostic information that's meaningful to a Qnet developer, but not to you. However, you can use grep to extract certain fields:

$ cat /proc/qnetstats | grep "compiled"
**** Qnet compiled on Jun 25 2003 at 17:14:27 running on qnet02

or:

$ cat /proc/qnetstats | grep -e "ok" -e "bad"
  txd ok       19415966
  txd bad      31
  rxd ok       10254788
  rxd bad dr   0
  rxd bad L4   0

If you need help getting Qnet running, our Technical Support department might ask you for this information.

Is the Qnet version correct?

Since Neutrino includes two versions of Qnet stacks that are incompatible in regard to packet format, a conflict could arise, and native networking might not work. If this happens, make sure that npm-qnet.so is a symbolic link to the same version of the Qnet protocol stack on both machines. For more information, see "Software components for Qnet networking," earlier in this chapter.

You can also use the ping command:

$ ping

to verify if all other things (such as network cards, TCP protocol) are working. If ping works, it's likely that the only problem lies with the versions of Qnet.

Is the hostname unique?

Use the hostname command to see the hostname. This hostname must be unique for Qnet to work.

Are the nodes in the same domain?

If the nodes aren't in the same domain, you have to specify the domain. For example:

ls /net/kenneth.qnx.com