Hello all,
Looking to see if anyone else has had an issue wherein a Distributed job gets multiple nodes assigned (as expected), and then the nodes set to “spawner” types eventually disconnect before the job is actually complete.
The Max log file on the “master” node shows an error “Render host s13 ([IP_of_node:20204) released because there is no more pending work” for all render nodes that tried to help.
And it seems to happen randomly during the rendering process - for example, some nodes might help out for the entire time without disconnecting, and other nodes will connect to the master node, load the job, render for like 5 minutes, and then disconnect with this error.
So if we have 13 render nodes, and just one large image rendering, it would be nice to have all the nodes help. But instead, what we see is all the nodes help for a few minutes, then one at a time they are “released” from the job before it is even close to being finished. Restarting Pulze on those nodes does not do anything to force the Distributor to re-assign nodes because for whatever reason, the Master node (or the Distributor?) has decided that only 2 nodes are needed for that job.
I searched a little online and this has been a problem with V-Ray spawner behavior before, but I didn’t find any solutions that could help in this case.
Has anyone else experienced this type of problem? Thanks!
Using Max 2024 & V-Ray 6 (but we did see this happen with Max 2022 and V-Ray 5 as well).