Find the non zero values and frequency of those values in R -



Find the non zero values and frequency of those values in R -

i have info has 2 parameters, data/time , flow. flow info intermittent flow. lets @ times there 0 flow , flow starts , there non-zero values sometime , flow 0 again. want understand when non-zero values occur , how long each non-zero flow last. have attached sample dataset @ location https://www.dropbox.com/s/ef1411dq4gyg0cm/sampledataflow.csv

the info 1 min data.

i able import info r follows:

flow <- read.csv("sampledataflow.csv") summary(flow) names(flow) <- c("date","discharge") flow$date <- strptime(flow$date, format="%m/%d/%y %h:%m") sapply(flow,class) plot(flow$date, flow$discharge,type="l")

i made plot see distribution couldn't clue start frequency of each non 0 values. see output table follows:

date duration in minutes

please allow me know if not clear here. thanks.

additional info:

i think need check non-zero value first , find how many non 0 values there continuously before reaches 0 value again. want understand flow release durations. eg. in 1 day there might multiple releases , want note @ time did release start , how long did go on before coming value zero. hope explain problem little better.

the first point have many na in data. in case want it. if understand correctly, require count of continuous 0's followed continuous non-zeros, zeros, non-zeros etc.. each date.

this can achieved rle of course, mentioned @mnel under comments. there quite few catches.

first, i'll set info non-na entries:

flow <- read.csv("~/downloads/sampledataflow.csv") names(flow) <- c("date","discharge") flow <- flow[1:33119, ] # remove na entries # format date posixct play nice data.table flow$date <- as.posixct(flow$date, format="%m/%d/%y %h:%m")

next, i'll create date column:

flow$g1 <- as.date(flow$date)

finally, prefer using data.table. here's solution using it.

# load package, info data.table , set key require(data.table) flow.dt <- data.table(flow) # set key both "date" , "g1" (even though, we'll utilize g1) # create sure order of rows not changed (during sort) setkey(flow.dt, "date", "g1") # grouping g1 , set info true/false equating 0 , rle lengths out <- flow.dt[, list(duration = rle(discharge == 0)$lengths, val = rle(discharge == 0)$values + 1), by=g1][val == 2, val := 0] > out # show few first , lastly entries # g1 duration val # 1: 2010-05-31 120 0 # 2: 2010-06-01 722 0 # 3: 2010-06-01 138 1 # 4: 2010-06-01 32 0 # 5: 2010-06-01 79 1 # --- # 98: 2010-06-22 291 1 # 99: 2010-06-22 423 0 # 100: 2010-06-23 664 0 # 101: 2010-06-23 278 1 # 102: 2010-06-23 379 0

so, example, 2010-06-01, there 722 0's followed 138 non-zeros, followed 32 0's followed 79 non-zeros , on...

r

Comments

Popular posts from this blog

web services - java.lang.NoClassDefFoundError: Could not initialize class net.sf.cglib.proxy.Enhancer -

Accessing MATLAB's unicode strings from C -

javascript - mongodb won't find my schema method in nested container -