Which way to go? QNX or Solaris?

bridged with qdn.public.qnxrtp.advocacy
Dmitri Ivanov

Re: Which way to go? QNX or Solaris?

Post by Dmitri Ivanov » Fri Oct 04, 2002 6:14 am

Hey! Igor seems to be the only one who was helpful on the subject!
He gave me a good overview, so what if some details were wrong?
I can always look up in the docs. After all, who can remember
every detail on every platform? (except OS developers, maybe).
Many thanks, Igor. I really appreciate your following.
Although you like to sound like an authority on Solaris and QNX
driver development, it's looking like you have little real driver
development experience, due to the lack of basic understanding that
you exhibit.

Your apparent expertise sounds particularly lame to someone who
has spent 2 years writing Solaris drivers for Sun Microsystems (I
wrote the elxl driver) and 4.5 years writing drivers for QSSL.

Perhaps you should post on slashdot instead: what you post does not
need to be accurate, but you'll get the advantages of reaching a wider
audience, while not distracting QSSL staff.

Dave

Mario Charest

Re: Which way to go? QNX or Solaris?

Post by Mario Charest » Fri Oct 04, 2002 12:52 pm

"Chris McKillop" <cdm@qnx.com> wrote in message
news:anikb4$lv$1@nntp.qnx.com...
Eh, thank you Kris.
On a second thought, this is .advocacy group after all. Perhaps we don't
have to care *here* if QNX staff is distracted ;)


That is what I was going to say. This is the area where the gloves
come off. Check your feelings at the door. ;)
I agree 125%. Personnaly I view this group as the only were we can
speak our mind. Obviously QNX staffer don't have total freedom in
that regard because they represent QSS. Plus there is ALWAYS
more then meets the eyes. I tend to beleive coins have more then
2 sides (making reference to the "other side of the coin" expression)
That being said I welcome David's post on the ground that he said
it like he felt it, that's good. Same for Igor.

However unlike Chris I would say don't leave your feeling at the door, bring
them in, make yourself vulnerable. There is so much to learn.

Curiously it seems humans grow/evolve the fastest when pain and suffering is
involved ;-|


chris


--
Chris McKillop <cdm@qnx.com> "The faster I go, the behinder I get."
Software Engineer, QSSL -- Lewis Carroll --
http://qnx.wox.org/

Rennie Allen

Re: Which way to go? QNX or Solaris?

Post by Rennie Allen » Fri Oct 04, 2002 1:32 pm

Chris McKillop wrote:
Igor Levko <spama@nihrena.net> wrote:

Let's get back to the source. Igor's original statement was
"That manifests itself as system not doing anything except for spitting 'Out
of interrupt events' on the
screen continuously. The end result is, you have to reset it. It is as good
as a crash."



As far as I can tell, this is only going to happen when you use
InterruptAttach() and your ISR does not call InterruptMask() before returning
the the event to the thread. In this case the ISR will continue to fire,
and depending on timings, the handling thread may never get to run. This is
a bug in the driver. And just shows that igor's original statement, that when
you touch hardware there is no "crash proof", to be true. Just happens that
QNX is 10x better at managing the cases where the hardware isn't screwed up
(null pointer reference).
QNX (at least QNX4) seems to just be better at protection, even when
comparing process to process (i.e. not even considering the in-kernel
driver case). At the last company I worked at, a developer wrote and
"debugged" a program on his NT workstation, that I was to incorporate
into a controller with no MMU. His code had a test scaffold built-in,
and, as per our SOP, I compiled and ran it on QNX4, just before
incorporating it into the controller (for FDA doc purposes). It
SIGSEGV'd immediately. The same code executed without error on NT.

I quickly re-ran it inside the debugger, and sure enough there an
invalid (not null) pointer de-reference (which, incidentally, was in the
test scaffold and not any of the code to be incorporated into the
controller).

There are probably many reasons that NT didn't catch it (some of which
could be simple luck-of-the-draw - i.e. the value of the invalid
reference), but I have never had the opposite situation occur (i.e. NT
catch a bug in QNX code, that wasn't caught by QNX). I suspect that NT
provides a lot more "slack" backed VM to a process than QNX does.

Igor Levko

Re: Which way to go? QNX or Solaris?

Post by Igor Levko » Fri Oct 04, 2002 2:18 pm

Let's get back to the source. Igor's original statement was
"That manifests itself as system not doing anything except for spitting 'Out
of interrupt events' on the
screen continuously. The end result is, you have to reset it. It is as good
as a crash."

As far as I can see now igor's explanation of the origin of this problem
was not correct.
Obviously Donohoe's intention was not to explain it but rather tell us how
cool he is
and give some analysis of igor's personality. That was really lame.

Anyway, can anybody care to explain it, or at least point to some docs ?

cheers,
Igor

"Kris Warkentin" <kewarken@qnx.com> wrote in message
news:ani637$du8$1@nntp.qnx.com...
"Igor Kovalenko" <kovalenko@attbi.com> wrote in message
news:ani48a$l00$1@inn.qnx.com...
"David Donohoe" <ddonohoe@qnx.com> wrote in message
Perhaps you should post on slashdot instead: what you post does not
need to be accurate, but you'll get the advantages of reaching a wider
audience, while not distracting QSSL staff.


I am so sorry for distracting you. That won't happen again.

Lord knows I'm a fool for getting in the middle of this but I think that
it's unfair to characterize Igor as the type of person who tries to appear
knowledgeable just to hear himself talk. While he's sometimes abrasive,
(I
admit a certain guilty pleasure in seeing him get flamed ;-) I believe
that
his posts are generally accurate to the best of his ability and that he's
generally trying to contribute and help. He may have been mistaken this
time but I'm willing to bet that it doesn't happen that often.

Igor, I hope that you don't go away, as your last note implies, because I
think a lot of people would miss having you around. You're a valuable
resource both to the general public AND us qnx staff.

cheers,

Kris

Chris McKillop

Re: Which way to go? QNX or Solaris?

Post by Chris McKillop » Fri Oct 04, 2002 7:09 pm

Igor Levko <spama@nihrena.net> wrote:
Let's get back to the source. Igor's original statement was
"That manifests itself as system not doing anything except for spitting 'Out
of interrupt events' on the
screen continuously. The end result is, you have to reset it. It is as good
as a crash."
As far as I can tell, this is only going to happen when you use
InterruptAttach() and your ISR does not call InterruptMask() before returning
the the event to the thread. In this case the ISR will continue to fire,
and depending on timings, the handling thread may never get to run. This is
a bug in the driver. And just shows that igor's original statement, that when
you touch hardware there is no "crash proof", to be true. Just happens that
QNX is 10x better at managing the cases where the hardware isn't screwed up
(null pointer reference).

chris

--
Chris McKillop <cdm@qnx.com> "The faster I go, the behinder I get."
Software Engineer, QSSL -- Lewis Carroll --
http://qnx.wox.org/

Igor Kovalenko

Re: Which way to go? QNX or Solaris?

Post by Igor Kovalenko » Fri Oct 04, 2002 11:09 pm

I believe our case was different. All the drivers normally worked fine.
However occasional bugs in some drivers caused them to crash (SEGV).
That's where the 'out of interrupt events' came up, because there
apparently is a window for a driver to crash when the interrupt is not
masked (sorry, I am speculating again).

This whole story makes me wonder if the kernel could not mask the
interrupt before killing a SEGVed process that has interrupt handler
attached... Could it? Good opportunity for the heroes to strike! :)

-- igor

Chris McKillop wrote:
Igor Levko <spama@nihrena.net> wrote:

Let's get back to the source. Igor's original statement was
"That manifests itself as system not doing anything except for spitting 'Out
of interrupt events' on the
screen continuously. The end result is, you have to reset it. It is as good
as a crash."



As far as I can tell, this is only going to happen when you use
InterruptAttach() and your ISR does not call InterruptMask() before returning
the the event to the thread. In this case the ISR will continue to fire,
and depending on timings, the handling thread may never get to run. This is
a bug in the driver. And just shows that igor's original statement, that when
you touch hardware there is no "crash proof", to be true. Just happens that
QNX is 10x better at managing the cases where the hardware isn't screwed up
(null pointer reference).

chris

Chris McKillop

Re: Which way to go? QNX or Solaris?

Post by Chris McKillop » Fri Oct 04, 2002 11:31 pm

All of our drivers should be using the proper flag values to have the
mask/unmask states track by the kernel. I know all the ones I have worked
on have done so. Have you actually seen this recently igor or was it with
an older procnto?

chris


Igor Kovalenko <kovalenko@attbi.com> wrote:
I believe our case was different. All the drivers normally worked fine.
However occasional bugs in some drivers caused them to crash (SEGV).
That's where the 'out of interrupt events' came up, because there
apparently is a window for a driver to crash when the interrupt is not
masked (sorry, I am speculating again).

This whole story makes me wonder if the kernel could not mask the
interrupt before killing a SEGVed process that has interrupt handler
attached... Could it? Good opportunity for the heroes to strike! :)

-- igor

Chris McKillop wrote:
Igor Levko <spama@nihrena.net> wrote:

Let's get back to the source. Igor's original statement was
"That manifests itself as system not doing anything except for spitting 'Out
of interrupt events' on the
screen continuously. The end result is, you have to reset it. It is as good
as a crash."



As far as I can tell, this is only going to happen when you use
InterruptAttach() and your ISR does not call InterruptMask() before returning
the the event to the thread. In this case the ISR will continue to fire,
and depending on timings, the handling thread may never get to run. This is
a bug in the driver. And just shows that igor's original statement, that when
you touch hardware there is no "crash proof", to be true. Just happens that
QNX is 10x better at managing the cases where the hardware isn't screwed up
(null pointer reference).

chris

--
Chris McKillop <cdm@qnx.com> "The faster I go, the behinder I get."
Software Engineer, QSSL -- Lewis Carroll --
http://qnx.wox.org/

Igor Kovalenko

Re: Which way to go? QNX or Solaris?

Post by Igor Kovalenko » Sat Oct 05, 2002 12:17 am

The most recent case I believe was few month back. We are not talking
about your drivers only, there are 3rd party drivers involved (T1/E1
communication, DSPs, etc). They could not be using proper flags...

What are those proper flags anyway? The _NTO_INTR_FLAGS_TRK_MSK has
somewhat vague description. It is not clear that it will do what you
seem to imply - it talks about what happens when a shared interrupt is
detached. I am not sure driver dying is the same as interrupt detaching,
is it? And we're not talking about shared interrupts ...

-- igor

Chris McKillop wrote:
All of our drivers should be using the proper flag values to have the
mask/unmask states track by the kernel. I know all the ones I have worked
on have done so. Have you actually seen this recently igor or was it with
an older procnto?

chris


Igor Kovalenko <kovalenko@attbi.com> wrote:

I believe our case was different. All the drivers normally worked fine.
However occasional bugs in some drivers caused them to crash (SEGV).
That's where the 'out of interrupt events' came up, because there
apparently is a window for a driver to crash when the interrupt is not
masked (sorry, I am speculating again).

This whole story makes me wonder if the kernel could not mask the
interrupt before killing a SEGVed process that has interrupt handler
attached... Could it? Good opportunity for the heroes to strike! :)

-- igor

Chris McKillop wrote:

Igor Levko <spama@nihrena.net> wrote:


Let's get back to the source. Igor's original statement was
"That manifests itself as system not doing anything except for spitting 'Out
of interrupt events' on the
screen continuously. The end result is, you have to reset it. It is as good
as a crash."



As far as I can tell, this is only going to happen when you use
InterruptAttach() and your ISR does not call InterruptMask() before returning
the the event to the thread. In this case the ISR will continue to fire,
and depending on timings, the handling thread may never get to run. This is
a bug in the driver. And just shows that igor's original statement, that when
you touch hardware there is no "crash proof", to be true. Just happens that
QNX is 10x better at managing the cases where the hardware isn't screwed up
(null pointer reference).

chris


Chris McKillop

Re: Which way to go? QNX or Solaris?

Post by Chris McKillop » Sat Oct 05, 2002 12:32 am

Igor Kovalenko <kovalenko@attbi.com> wrote:
The most recent case I believe was few month back. We are not talking
about your drivers only, there are 3rd party drivers involved (T1/E1
communication, DSPs, etc). They could not be using proper flags...
Or they could be assuming the interrupt is masked for them. Are they
using InterruptAttach() or InterruptAttachEvent()?
What are those proper flags anyway? The _NTO_INTR_FLAGS_TRK_MSK has
somewhat vague description. It is not clear that it will do what you
seem to imply - it talks about what happens when a shared interrupt is
detached. I am not sure driver dying is the same as interrupt detaching,
is it? And we're not talking about shared interrupts ...
My understanding is if you use that flag it will do reference counts for
masking and unmasking. If you don't use it then the kernel assumes you know
what you are doing and does what you ask. Might not be possible to get into
the state you are talking about with/without this flag, but I am always
warry of things not doing things right.

chris

--
Chris McKillop <cdm@qnx.com> "The faster I go, the behinder I get."
Software Engineer, QSSL -- Lewis Carroll --
http://qnx.wox.org/

Igor Kovalenko

Re: Which way to go? QNX or Solaris?

Post by Igor Kovalenko » Sat Oct 05, 2002 12:46 am

They are using InterruptAttach().

I'd appreciate a definitive answer whether or not using
_NTO_INTR_FLAGS_TRK_MSK will prevent 'out of interrupt' events if such a
driver dies before masking interrupt (whether shared or not).

-- igor

Chris McKillop wrote:
Igor Kovalenko <kovalenko@attbi.com> wrote:

The most recent case I believe was few month back. We are not talking
about your drivers only, there are 3rd party drivers involved (T1/E1
communication, DSPs, etc). They could not be using proper flags...



Or they could be assuming the interrupt is masked for them. Are they
using InterruptAttach() or InterruptAttachEvent()?


What are those proper flags anyway? The _NTO_INTR_FLAGS_TRK_MSK has
somewhat vague description. It is not clear that it will do what you
seem to imply - it talks about what happens when a shared interrupt is
detached. I am not sure driver dying is the same as interrupt detaching,
is it? And we're not talking about shared interrupts ...



My understanding is if you use that flag it will do reference counts for
masking and unmasking. If you don't use it then the kernel assumes you know
what you are doing and does what you ask. Might not be possible to get into
the state you are talking about with/without this flag, but I am always
warry of things not doing things right.

chris

Post Reply

Return to “qdn.public.qnxrtp.advocacy”