Pulze Render Manager stops communicating with Nodes

We’ve now experienced this issue twice. Over the weekend we had several render jobs in the queue, and after logging in this morning to check progress all render nodes had stalled. In PRM, the Jobs tab shows the nodes actively working on frames in a job and lists their percent complete. However, logging into individual nodes we can see in the Virtual Frame Buffer that the image is actually finished rendering, but 3ds Max is Not Responding in Task Manager. It looks like Pulze stopped communicating with the Nodes, and they don’t know what to do.

The last time we experienced this issue, we restarted all instances of PRM and then they began rendering again.

We do have PRM running on one machine that is not added as a Node to act as the manager.

A couple other things to add here. One Node actually stopped on a different job with the message Writing Output. That happened a day and a half prior to the other machines going down.

Also, we tried restarting PRM on just two machines and neither are getting picked up yet. My plan is to restart the rest and see if there’s one in particular that is locking things up, or if it is a matter of restarting ALL before it starts rendering.

Any idea why things could be stalling out like this?

Update since the last post… Restarting PRM on our Primary Distributor got everything working again. Since that machine was a very old computer with limited resources, we decided to replace it as our PD and run PRM on an unused workstation. Our render job that ran over night started off fine, but we still ended up with 4 of our 6 nodes getting stuck, all at different times. My workstation was set to Node Mode using 22 of 24 cores, and it’s status is listed in PRM as ‘Preparing - Task’ but I can see a fully rendered frame when I logged in and 3ds Max is unresponsive. The other 4 stalled nodes have statuses of Writing Output, Working (with no percentage listed) and Render image (0%).

Within PRM I tried to reset the tasks that the stalled Nodes were active on, and one of those machines seems to have moved on to another frame even though it is still listed as rendering a previous one.

I’ve got to say, I’m a little disappointed with my Pulze experience so far. We wanted to start using something more reliable than Autodesk’s Backburner but haven’t really seen that yet.

Now I’m just stumped and frustrated. Our first issues with PRM seemed to be related to the machine we’d set as Primary Distributor, which was underpowered and ancient. So we tried a workstation that was not going to be rendering, and that one failed too. But that machine has been having performance issues, which is why we don’t render to it, so my next experiment was to use my own workstation as PD. This machine has had no issues in the past, so everything should work great. Right?

When I logged in this morning to check progress on our render jobs, I found nothing rendering. Even though my workstation is set as PD, the Current Distributor Device field in PRM says ‘No master’. What?!? There’s no problem that I can see with my computer, it wasn’t rendering or doing anything. How did it lose it’s Distributorship?

EDIT: We were going to attempt to restart all instances of Pulze, but the Pulze on my workstation (as the PD) froze up completely. After I restarted it, all of the old jobs that had been removed the day prior showed back up in the queue and the two new render jobs (that had not completed) are now missing.

Hi @chris.medeck

Thanks for the detailed explanation of your issue. We are looking into the logs that you sent to our support and will come up with a solution!