Выведите R-команду через что-то вроде try()

Я запускаю большое количество итераций параллельно. Некоторые итерации занимают много (скажем, 100 раз) дольше других. Я хочу рассказать об этом, но я бы предпочел не копаться в C-коде за функцией (назовите это fun.c), делая тяжелый подъем. Я надеюсь, что есть что-то похожее на try(), но с опцией time.out. Тогда я мог бы сделать что-то вроде:

Итак, если fun.c занял более 60 секунд для определенной итерации, то обновленная функция try() просто убьет его и вернет предупреждение или что-то подобное по этим строкам.

Ответы

Ответ 1

Смотрите эту тему: http://r.789695.n4.nabble.com/Time-out-for-a-R-Function-td3075686.html

и ?evalWithTimeout в пакете R.utils.

Вот пример:

require(R.utils)

## function that can take a long time
fn1 <- function(x)
{
    for (i in 1:x^x)
    {
        rep(x, 1000)
    }
    return("finished")
}

## test timeout
evalWithTimeout(fn1(3), timeout = 1, onTimeout = "error") # should be fine
evalWithTimeout(fn1(8), timeout = 1, onTimeout = "error") # should timeout

Ответ 2

Это похоже на то, что должно быть чем-то, над чем должно управляться все, что отвлекает задачи от рабочих, а не что-то, что должно содержаться в рабочем потоке. Пакет multicore поддерживает таймауты для некоторых функций; snow, насколько я могу судить, не делает.

EDIT: Если вы действительно отчаянно нуждаетесь в этом в рабочих потоках, попробуйте эту функцию, вдохновленную ссылками в ответе @jthezzel.

try_with_time_limit <- function(expr, cpu = Inf, elapsed = Inf)
{
  y <- try({setTimeLimit(cpu, elapsed); expr}, silent = TRUE) 
  if(inherits(y, "try-error")) NULL else y 
}

try_with_time_limit(sqrt(1:10), 1)                   #value returns as normal
try_with_time_limit(for(i in 1:1e7) sqrt(1:10), 1)   #returns NULL

Возможно, вы захотите настроить поведение в случае тайм-аута. На данный момент он просто возвращает NULL.

Ответ 3

Вы упомянули в комментарии, что ваша проблема связана с длительностью кода C. По моему опыту, ни одно из решений тайм-аута, основанных исключительно на R, основанных на setTimeLimit/evalWithTimeout, не может остановить выполнение кода C, если код не дает возможности прервать R.

Вы также упомянули в комментарии, что вы распараллеливаете SNOW. Если вы распараллеливаете компьютеры, это ОС, поддерживающая forking (т.е. Не Windows), то вы можете использовать mcparallel (в пакете parallel, полученный из multicore) в контексте команды, в node на кластере SNOW; обратное также верно BTW, вы можете запускать SNOW-кластеры из контекста вилки multicore. Этот ответ также (конечно) выполняется, если вы не распараллеливаете через SNOW, если машина, которая должна таймаутировать код C, может использовать fork.

Это поддается eval_fork, решению, используемому opencpu. Посмотрите ниже тела функции eval_fork для схемы взлома в Windows и плохо реализованной половины версии этого взлома.

eval_fork <- function(..., timeout=60){

  #this limit must always be higher than the timeout on the fork!
  setTimeLimit(timeout+5);      

  #dispatch based on method
  ##NOTE!!!!! Due to a bug in mcparallel, we cannot use silent=TRUE for now.
  myfork <- parallel::mcparallel({
    eval(...)
  }, silent=FALSE);

  #wait max n seconds for a result.
  myresult <- parallel::mccollect(myfork, wait=FALSE, timeout=timeout);

  #try to avoid bug/race condition where mccollect returns null without waiting full timeout.
  #see https://github.com/jeroenooms/opencpu/issues/131
  #waits for max another 2 seconds if proc looks dead 
  while(is.null(myresult) && totaltime < timeout && totaltime < 2) {
     Sys.sleep(.1)
     enddtime <- Sys.time();
     totaltime <- as.numeric(enddtime - starttime, units="secs")
     myresult <- parallel::mccollect(myfork, wait = FALSE, timeout = timeout);
  }

  #kill fork after collect has returned
  tools::pskill(myfork$pid, tools::SIGKILL);    
  tools::pskill(-1 * myfork$pid, tools::SIGKILL);  

  #clean up:
  parallel::mccollect(myfork, wait=FALSE);

  #timeout?
  if(is.null(myresult)){
    stop("R call did not return within ", timeout, " seconds. Terminating process.", call.=FALSE);      
  }

  #move this to distinguish between timeout and NULL returns
  myresult <- myresult[[1]];

  #reset timer
  setTimeLimit();     

  #forks don't throw errors themselves
  if(inherits(myresult,"try-error")){
    #stop(myresult, call.=FALSE);
    stop(attr(myresult, "condition"));
  }

  #send the buffered response
  return(myresult);  
}

Windows hack: В принципе, особенно с рабочими узлами в SNOW, вы можете добиться чего-то подобного, имея рабочие узлы:

создать переменную для хранения временного файла
сохранить свое рабочее пространство (save.image) в известное местоположение
Используйте системный вызов для загрузки Rscript с помощью R script, который загружает рабочее пространство, сохраненное в node, а затем сохраняет результат (по существу, делает медленную вилку памяти рабочего пространства R).
Введите цикл повторения для каждого рабочего node, который ищет файл результатов, если файл результата не отображается после установленного периода времени, перерыв из цикла и сохранение возвращаемого значения, отражающего тайм-аут
В противном случае, успешно завершите внешний вид и прочитайте сохраненный результат и подготовьте его для возврата

Я написал код a/long/time ago для чего-то вроде mcparallel в Windows на localhost, используя медленные копии памяти. Я бы написал это совершенно по-другому сейчас, но это может дать вам место для начала, поэтому я все равно предоставляю его. Некоторые из них отметили, что russmisc был пакетом, который я пишу, который теперь находится на github как repsych. glibrary - это функция в repsych, которая устанавливает пакет, если он еще не доступен (что потенциально важно, если ваш SNOW не только на localhost).... и, конечно, я не использовал этот код для /years/, и я не тестировал его недавно - возможно, версия, которую я использую, содержит ошибки, которые я разрешил в более поздних версиях.

# Farm has been banished here because it likely violates 
# CRAN rules in regards to where it saves files and is very
# windows specific.  Also, the darn thing is buggy.

#' Create a farm
#'
#' A farm is an external self-terminating instance of R to solve a time consuming problem in R.  
#' Think of it as a (very) poor-person multi-core.
#' For a usage example, see checkFarm.
#' Known issues:  May have a problem if the library gdata has been loaded.//
#' If a farm produces warnings or errors you won't see them
#' If a farm produces an error... it never will produce a result.
#'
#' @export
#' @param commands A text string of commands including line breaks to run.  
#' This must include the result being saved in the object farmName in the file farmResult (both are variables provided by farm() to the farm).
#' @param farmName This is the name of the farm, used for creating and destroying filenames.  One is randomly assigned that is plausibly unique.
#' @param Rloc The location of R.exe.  The default loads the version of R that is stored in the windows registry as being \"current\".
#' @return The farm name is returned to be stored in an object and then used in checkFarm()
#' @seealso \code{\link{checkFarm}} \code{\link{waitForFarm}}
farm <- function(commands,farmName=paste("farm-",as.integer(Sys.time())+runif(1),sep=""),Rloc = NULL)
{
  if (is.null(Rloc)) {Rloc <- paste('\"',readRegistry(paste("Software\\R-core\\R\\",readRegistry("Software\\R-core\\R\\",maxdepth=100)$`Current Version`,"\\",sep=""))$InstallPath,"\\bin",sep="")}
  Rloc <- paste(Rloc,"\\R.exe\"",sep="")
  farmRda <- paste(farmName,".Rda",sep="")
    farmRda.int <- paste(farmName,".int.Rda",sep="") #internal .Rda
    farmR <- paste(farmName,".R",sep="")
    farmResult <- paste(farmName,".res.Rda",sep="") #result .Rda
    unlink(c(farmRda,farmR,farmResult,farmRda.int))
    farmwd <- getwd()
    cat("setwd(\"",farmwd,"\")\n",file=farmR,append=TRUE,sep="")
    #loading the internals to get them, then loading the globals, then reloading the internals to make sure they have haven't been overwritten
  cat("
load(\"",farmRda.int,"\")
load(farmRda)
load(\"",farmRda.int,"\")
        ",file=farmR,append=TRUE,sep="")
    cat("library(russmisc)\n",file=farmR,append=TRUE)
    cat("glibrary(",paste(c(names(sessionInfo()$loadedOnly),names(sessionInfo()$otherPkgs)),collapse=","),")\n",file=farmR,append=TRUE)
    cat(commands,file=farmR,append=TRUE)
    cat("
        unlink(farmRda)
        unlink(farmRda.int)
    ",file=farmR,append=TRUE,sep="")
    save(list = ls(all.names=TRUE,envir=.GlobalEnv), file = farmRda,envir=.GlobalEnv)
    save(list = ls(all.names=TRUE), file = farmRda.int)
    #have to drop the escaped quotes for file.exists to find the file
  if (file.exists(gsub('\"','',Rloc))) {
        cmd <- paste(Rloc," --file=",getwd(),"/",farmR,sep="")
    } else {
        stop(paste("Error in russmisc:farm: Unable to find R.exe at",Rloc))
    }
    print(cmd)
    shell(cmd,wait=FALSE)
    return(farmName)
}
NULL

#' Check a farm
#'
#' See farm() for details on farms.  This function checks for a file based on the farmName parameter called farmName.res.Rda.
#' If that file exists it loads it and returns the object stored by the farm in the object farmName.  If that file does not exist,
#' then the farm is not done processing, and a warning and NULL are returned.  Note that a rapid loop through checkFarm() without Sys.sleep produced an error during development.
#'
#' @export
#' @param farmName This is the name of the farm, used for creating and destroying filenames.  This should be saved from when the farm() is created
#' @seealso \code{\link{farm}} \code{\link{waitForFarm}}
#' @examples 
#' #Example not run
#' #.tmp <- "This is a test of farm()"
#' #exampleFarm <- farm("
#' #print(.tmp)
#' #helloFarm <- 10+2
#' #farmName <- helloFarm
#' #save(farmName,file=farmResult)
#' #")
#' #example.result <- checkFarm(exampleFarm)
#' #while (is.null(example.result)) {
#' #    example.result <- checkFarm(exampleFarm)
#' #    Sys.sleep(1)
#' #}
#' #print(example.result)
checkFarm <- function(farmName) {
  farmResult <- paste(farmName,".res.Rda",sep="")
  farmR <- paste(farmName,".r",sep="")
  if (!file.exists(farmR)) {
    message(paste("Warning in russmisc:checkFarm:  There is no evidence that the farm '",farmName,"' exists (no .r file found).\n",sep=""))
  }
    if (file.exists(farmResult)) {
        load(farmResult)
    unlink(farmResult) #delete the farmResult file
    unlink(farmR)      #delete the script file
        return(farmName)
    } else {
        warning(paste("Warning in russmisc:checkFarm:  The farm '",farmName,"' is not ready.\n",sep=""))
        return(invisible(NULL))
    }
}
NULL

#' Wait for a farm result
#'
#' This function repeatedly checks for a farm, when the farm is found it returns the harvest (the farm result object).
#' If the farm terminated with an error or there is some other sort of coding error, waitForFarm will be an infinate loop. As
#' \code{checkFarm} produces errors on checks when the harvest is not ready, waitForFarm hides these errors in the factory error-catching wrapper.
#'
#' @export
#' @param farmName This is the name of the farm, used for creating and destroying filenames.  This should be saved from when the farm() is created
#' @param noCheck If this value is TRUE the check for the farm .r is skipped.  If it is FALSE, the existance of the appropriate .r is checked for before entering a potentially unending while loop.
waitForFarm <- function(farmName,noCheck=FALSE) {
  f.checkFarm <- factory(checkFarm)
  farmR <- paste(farmName,".r",sep="")
  if (!file.exists(farmR) & !noCheck) {
    stop(paste("Error in russmisc:checkFarm:  There is no evidence that the farm '",farmName,"' exists (no .r file found).\n",sep=""))
  }
  repeat {
    harvest <- f.checkFarm(farmName)
    if (!is.null(harvest[[1]])) {break}
    Sys.sleep(1)
  }
    return(harvest[[1]])
}
NULL

#' Create a one-line simple farm
#'
#' This is a convience wrapper function that uses farm to create a single farm appropriate for processing single line commands.
#'
#' @export
#' @param command A single command
#' @param farmName This is the name of the farm, used for creating and destroying filenames.  One is randomly assigned that is plausibly unique.
#' @param Rloc The location of R.exe.  The default loads the version of R that is stored in the windows registry as being \"current\".
#' @return The farm name is returned to be stored in an object and then used in checkFarm()
#' @seealso \code{\link{farm}}, \code{\link{checkFarm}}, and \code{\link{waitForFarm}}
#' @examples
#' #Example not run
#' #a <- 5
#' #b <- 10
#' #farmID <- simpleFarm("a + b")
#' #waitForFarm(farmID)
simpleFarm <- function(command,farmName=paste("farm-",as.integer(Sys.time())+runif(1),sep=""),Rloc = NULL) {
  return(farm(paste("farmName <- (",command,");save(farmName,file=farmResult)",collapse=""),farmName=paste("farm-",as.integer(Sys.time())+runif(1),sep=""),Rloc = NULL))
}
NULL

Ответ 4

Мне нравится R.utils::withTimeout(), но я также стараюсь избегать зависимостей пакетов, если могу. Вот решение для базы R. Обратите внимание на вызов on.exit(). Он обязательно снимает ограничение по времени, даже если ваше выражение выдает ошибку.

with_timeout <- function(expr, cpu, elapsed){
  expr <- substitute(expr)
  envir <- parent.frame()
  setTimeLimit(cpu = cpu, elapsed = elapsed, transient = TRUE)
  on.exit(setTimeLimit(cpu = Inf, elapsed = Inf, transient = FALSE))
  eval(expr, envir = envir)
}