Pulze Render Manager stops communicating with Nodes

We’ve now experienced this issue twice. Over the weekend we had several render jobs in the queue, and after logging in this morning to check progress all render nodes had stalled. In PRM, the Jobs tab shows the nodes actively working on frames in a job and lists their percent complete. However, logging into individual nodes we can see in the Virtual Frame Buffer that the image is actually finished rendering, but 3ds Max is Not Responding in Task Manager. It looks like Pulze stopped communicating with the Nodes, and they don’t know what to do.

The last time we experienced this issue, we restarted all instances of PRM and then they began rendering again.

We do have PRM running on one machine that is not added as a Node to act as the manager.

A couple other things to add here. One Node actually stopped on a different job with the message Writing Output. That happened a day and a half prior to the other machines going down.

Also, we tried restarting PRM on just two machines and neither are getting picked up yet. My plan is to restart the rest and see if there’s one in particular that is locking things up, or if it is a matter of restarting ALL before it starts rendering.

Any idea why things could be stalling out like this?

1 Like

Update since the last post… Restarting PRM on our Primary Distributor got everything working again. Since that machine was a very old computer with limited resources, we decided to replace it as our PD and run PRM on an unused workstation. Our render job that ran over night started off fine, but we still ended up with 4 of our 6 nodes getting stuck, all at different times. My workstation was set to Node Mode using 22 of 24 cores, and it’s status is listed in PRM as ‘Preparing - Task’ but I can see a fully rendered frame when I logged in and 3ds Max is unresponsive. The other 4 stalled nodes have statuses of Writing Output, Working (with no percentage listed) and Render image (0%).

Within PRM I tried to reset the tasks that the stalled Nodes were active on, and one of those machines seems to have moved on to another frame even though it is still listed as rendering a previous one.

I’ve got to say, I’m a little disappointed with my Pulze experience so far. We wanted to start using something more reliable than Autodesk’s Backburner but haven’t really seen that yet.

Now I’m just stumped and frustrated. Our first issues with PRM seemed to be related to the machine we’d set as Primary Distributor, which was underpowered and ancient. So we tried a workstation that was not going to be rendering, and that one failed too. But that machine has been having performance issues, which is why we don’t render to it, so my next experiment was to use my own workstation as PD. This machine has had no issues in the past, so everything should work great. Right?

When I logged in this morning to check progress on our render jobs, I found nothing rendering. Even though my workstation is set as PD, the Current Distributor Device field in PRM says ‘No master’. What?!? There’s no problem that I can see with my computer, it wasn’t rendering or doing anything. How did it lose it’s Distributorship?

EDIT: We were going to attempt to restart all instances of Pulze, but the Pulze on my workstation (as the PD) froze up completely. After I restarted it, all of the old jobs that had been removed the day prior showed back up in the queue and the two new render jobs (that had not completed) are now missing.

Hi @chris.medeck

Thanks for the detailed explanation of your issue. We are looking into the logs that you sent to our support and will come up with a solution!

Hi @peter.sarhidai

Was there a fix to this problem?
We have just moved to Pulze which is great, but one some renders from 3dsmax 2024 using vray 6, some render nodes hang in the same way described above. Always hands on “writting output : done” on the Vray info bar.

Restarting the Pulze app on those nodes helps but obviously we cant do this all the time and will loose critical render time.

Hope you were able to fix this and can share a little knowledge.

Morning.

Further to my error above I noticed this…
The node shows as booked, it is rendering when I remote in but RM doesnt show progress of its currenty task.

Im wondering if this has something to with it. I have emailed in a support ticket with collected logs of a few machines.

I think this could be why they get stuck. Maybe being booked on 2 renders without it knowning. Clutching at straws with that bit lol.

we are having the same problems as above - all of them actually.
We come the next day and see that some number of nodes hang and pulze is unresponsive (you have to end task it and 3dsmax) also some times we have the booked issue where it never acutally starts.

The hanging issue where pulze becomes unresponsive is the most frequent and this was not the case like 2 3 dot versions ago

1 Like

Hi,

We have very simillar issues and after updating to 2.2.10 there is no progress with rendering. Even restarting all machines is not working.
@peter.sarhidai how to fix those issues?

Got this issue too now… just randomly stops and machine says it’s “BOOKED” a job but never starts…

1 Like

Until we are ready with the next version, the solution is to set the processor affinity to all cores - 1, this will keep the connections stable and your render should go smoothly.

Even when we put -1, it alleviates some of the problems, but it still happens. Or sometimes a pc autorestarts to update so when it comes back online it doesnt get the -1 core from before.
Is there like an auto -XX cores we can put automatically ?