Hi again marco...
I have found another problem in DOA which I feel definately needs to be addressed in the next patch/upgrade. This one has caused all kinds of havoc for us here, until I was finally able to nail down the source of the problem.
We are using DOA 3.4.6 in a multi-tier scenario (MIDAS), using Delphi 5 compiler. Basically, components run on application servers (inside of MTS) which access the Oracle databases, and several tiers of business and GUI components access those server components from other computers. Its actually much more complicated than that, but thats the basics of our setup.
The symptoms we have been seeing are very sporadic... and wouldnt you know it, they didnt surface in Dev/QA environments often enough to draw attention... but in "production" environment, the symptoms occur very frequently. We have to deal with it roughly 5-10 times per day on average, and the servers are not even under significant load yet (10 concurrent users max per server). What happens is that at some seemingly random point, when a server MtsDatamodule is destroyed by COM+ Services, the associated process will suddenly begin to run at 100% CPU utilization, and consume available memory at a rate of ~200k per second... until the server's available virtual memory pool is exhausted a couple hours later and the operating system finally goes belly-up. During this period, COM+ Manager utility shows zero instances of the component. This only happens in MTS modules which make use of DOA components.
We also have much older MIDAS servers which run as executables (these are getting converted to MTS as time permits) and that contain DOA components. These programs have shown a history of "hanging" during automatic shutdown, and I assume its actually due to the same problem we see in the Mts configurations - just manifested in a different way.
Basically, the problem (as best I have been able to determine) is that the destructors in the DOA components are currently not bulletproof enough. I am not sure yet which component exactly is the offender, but it is most likely the TOracleSession or TOracleDataset (it could even be a problem with all DOA components).
The problem occurs when an exception is raised during the overrided destructor of these components. This causes the inherited destructor to never execute, and more importantly, it causes the destructor of the parent datamodule to cease execution, and the remaining component instances never get freed, and neither does the datamodule itself. COM+ Services is in charge of the object destruction at this point, and it thinks the datamodule has been successfully released, however it is still partially instantiated, and the DLL is therefore never allowed to be unloaded. I have no idea what happens at that point, other than we see the process (its actually a dllhost.exe process, the COM+ hosting executable) spiral out of control.
You can very easily replicate the problem by raising an exception in the destructor of any DOA component which is contained by a TMtsDatamodule registered in COM+ Services.
To fix this, we ended up patching TOracleSession, TOracleDataset, and TOracleQuery (the only components from DOA which we use). For now, we have wrapped every section of those destructors with try..except handlers, logging any exceptions to OutputDebugString (but not re-raising the exceptions). This ensures that the destruction sequence is unhindered, even when an error is encountered, and allows the host process to continue operating normally. We intend to run a debugging tool to monitor the output from OutputDebugString in production for a while, and perhaps we can more accurately pinpoint the offender component in the next few weeks.
I really dont know of any other way to correct the problem besides trapping and ignoring errors in the component destructors. At the point this occurs, COM+ Services has decided the component is no longer needed (refcount reached zero), and that it must be destroyed. If an error occurs during the destruction process, it really doesnt care... as long as the destruction was fully completed.
Is it possible to get this corrected in the next patch/upgrade of DOA? Its a significant problem from our perspective, and prevents us from using the components in a production environment without patches. I also can't imagine we would be the only DOA customer experiencing this.
I have found another problem in DOA which I feel definately needs to be addressed in the next patch/upgrade. This one has caused all kinds of havoc for us here, until I was finally able to nail down the source of the problem.
We are using DOA 3.4.6 in a multi-tier scenario (MIDAS), using Delphi 5 compiler. Basically, components run on application servers (inside of MTS) which access the Oracle databases, and several tiers of business and GUI components access those server components from other computers. Its actually much more complicated than that, but thats the basics of our setup.
The symptoms we have been seeing are very sporadic... and wouldnt you know it, they didnt surface in Dev/QA environments often enough to draw attention... but in "production" environment, the symptoms occur very frequently. We have to deal with it roughly 5-10 times per day on average, and the servers are not even under significant load yet (10 concurrent users max per server). What happens is that at some seemingly random point, when a server MtsDatamodule is destroyed by COM+ Services, the associated process will suddenly begin to run at 100% CPU utilization, and consume available memory at a rate of ~200k per second... until the server's available virtual memory pool is exhausted a couple hours later and the operating system finally goes belly-up. During this period, COM+ Manager utility shows zero instances of the component. This only happens in MTS modules which make use of DOA components.
We also have much older MIDAS servers which run as executables (these are getting converted to MTS as time permits) and that contain DOA components. These programs have shown a history of "hanging" during automatic shutdown, and I assume its actually due to the same problem we see in the Mts configurations - just manifested in a different way.
Basically, the problem (as best I have been able to determine) is that the destructors in the DOA components are currently not bulletproof enough. I am not sure yet which component exactly is the offender, but it is most likely the TOracleSession or TOracleDataset (it could even be a problem with all DOA components).
The problem occurs when an exception is raised during the overrided destructor of these components. This causes the inherited destructor to never execute, and more importantly, it causes the destructor of the parent datamodule to cease execution, and the remaining component instances never get freed, and neither does the datamodule itself. COM+ Services is in charge of the object destruction at this point, and it thinks the datamodule has been successfully released, however it is still partially instantiated, and the DLL is therefore never allowed to be unloaded. I have no idea what happens at that point, other than we see the process (its actually a dllhost.exe process, the COM+ hosting executable) spiral out of control.
You can very easily replicate the problem by raising an exception in the destructor of any DOA component which is contained by a TMtsDatamodule registered in COM+ Services.
To fix this, we ended up patching TOracleSession, TOracleDataset, and TOracleQuery (the only components from DOA which we use). For now, we have wrapped every section of those destructors with try..except handlers, logging any exceptions to OutputDebugString (but not re-raising the exceptions). This ensures that the destruction sequence is unhindered, even when an error is encountered, and allows the host process to continue operating normally. We intend to run a debugging tool to monitor the output from OutputDebugString in production for a while, and perhaps we can more accurately pinpoint the offender component in the next few weeks.
I really dont know of any other way to correct the problem besides trapping and ignoring errors in the component destructors. At the point this occurs, COM+ Services has decided the component is no longer needed (refcount reached zero), and that it must be destroyed. If an error occurs during the destruction process, it really doesnt care... as long as the destruction was fully completed.
Is it possible to get this corrected in the next patch/upgrade of DOA? Its a significant problem from our perspective, and prevents us from using the components in a production environment without patches. I also can't imagine we would be the only DOA customer experiencing this.