UnbufferedFile improvements v7

Message ID 43E4B4C0.6000106@cadsoft.de
State New
Headers

Commit Message

Klaus Schmidinger Feb. 4, 2006, 2:05 p.m. UTC
Artur Skawina wrote:
> this time with a new approach to read caching. Should make watching and 
> editing recordings on a non-idle (and/or slow) machine more comfortable.
> 
> The difference to previous versions (and stock fadvise-enabled vdr) is 
> that previously, read data was almost immediately forgotten after it was 
> used; now a certain amount of recently accessed data (at most ~16M) is 
> kept around. This means that short seeks (jumps) when replaying do not 
> cause disk accesses. Things like switching play mode, FF, setting and 
> moving editing marks shouldn't usually block waiting for disk IO. The 
> changes are most noticeable when eg. several recordings are happening in 
> the background.
> 
> I did very little testing, treat this as beta quality at best. Seems to 
> work ok, but i won't probably have time to test it further for a few 
> days; maybe somebody wants to play w/ this, or even better take a look 
> at the Read() path...

Attached is the patch as I would add it for the next version.

Please take a look at the lines marked CLARIFY and let me know
if you have any comment on these.

Klaus
  

Comments

Klaus Schmidinger Feb. 4, 2006, 2:28 p.m. UTC | #1
Klaus Schmidinger wrote:
> Artur Skawina wrote:
> 
>> this time with a new approach to read caching. Should make watching 
>> and editing recordings on a non-idle (and/or slow) machine more 
>> comfortable.
>>
>> The difference to previous versions (and stock fadvise-enabled vdr) is 
>> that previously, read data was almost immediately forgotten after it 
>> was used; now a certain amount of recently accessed data (at most 
>> ~16M) is kept around. This means that short seeks (jumps) when 
>> replaying do not cause disk accesses. Things like switching play mode, 
>> FF, setting and moving editing marks shouldn't usually block waiting 
>> for disk IO. The changes are most noticeable when eg. several 
>> recordings are happening in the background.
>>
>> I did very little testing, treat this as beta quality at best. Seems 
>> to work ok, but i won't probably have time to test it further for a 
>> few days; maybe somebody wants to play w/ this, or even better take a 
>> look at the Read() path...
> 
> 
> Attached is the patch as I would add it for the next version.

I just realized that FadviseDrop() should, of course, be
private in cUnbufferedFile. But that doesn't change any
functionality...

Klaus
  
Artur Skawina Feb. 4, 2006, 4:50 p.m. UTC | #2
Klaus Schmidinger wrote:
> Artur Skawina wrote:
>> this time with a new approach to read caching. Should make watching 
>> and editing recordings on a non-idle (and/or slow) machine more 
>> comfortable.
>>
>> The difference to previous versions (and stock fadvise-enabled vdr) is 
>> that previously, read data was almost immediately forgotten after it 
>> was used; now a certain amount of recently accessed data (at most 
>> ~16M) is kept around. This means that short seeks (jumps) when 
>> replaying do not cause disk accesses. Things like switching play mode, 
>> FF, setting and moving editing marks shouldn't usually block waiting 
>> for disk IO. The changes are most noticeable when eg. several 
>> recordings are happening in the background.
>>
>> I did very little testing, treat this as beta quality at best. Seems 
>> to work ok, but i won't probably have time to test it further for a 
>> few days; maybe somebody wants to play w/ this, or even better take a 
>> look at the Read() path...

after running w/ it for a week, i'd now say it works; haven't seen any problems 
that could be related to the Read() code.

> Attached is the patch as I would add it for the next version.

Why not the inlined Seeks()? The IO is very inefficient anyway (tiny read/write 
sizes), having an extra syscall (seek) for every read/write call doesn't help...
(yes, it may not be pretty, but it lets the compiler optimize the calls away, 
w/o touching any callers. Unless you fixed the callers, then ignore this)

> Please take a look at the lines marked CLARIFY and let me know
> if you have any comment on these.

i'll have to rediff this and compare w/ my current version; will post merged 
version later. Comments below.

> +++ cutter.c	2006/02/04 13:40:20
> @@ -66,6 +66,8 @@
>       toFile = toFileName->Open();
>       if (!fromFile || !toFile)
>          return;
> +     fromFile->SetReadAhead(MEGABYTE(20));
> +     toFile->SetReadAhead(MEGABYTE(20));
>       int Index = Mark->position;
>       Mark = fromMarks.Next(Mark);
>       int FileSize = 0;
> @@ -122,6 +125,7 @@
>                      error = "toFile 1";
>                      break;
>                      }
> +                 toFile->SetReadAhead(MEGABYTE(20));
>                   FileSize = 0;
>                   }
>                LastIFrame = 0;
> @@ -162,6 +166,7 @@
>                         error = "toFile 2";
>                         break;
>                         }
> +                    toFile->SetReadAhead(MEGABYTE(20));
>                      FileSize = 0;
>                      }
>                   }

i've dropped the toFile->SetReadAhead() hack -- it doesn't change that much when 
the datasyncs aren't there anyway.

> +++ tools.c	2006/02/04 13:55:24
> +        // Read ahead:
> +        // no jump? (allow small forward jump still inside readahead window).
> +        if (jumped >= 0 && jumped <= (off_t)readahead) {
> +           // Trigger the readahead IO, but only if we've used at least
> +           // 1/2 of the previously requested area. This avoids calling
> +           // fadvise() after every read() call.
> +           if (ahead - curpos < (off_t)(readahead - readahead / 2)) {//XXX CLARIFY: isn't this just 'readahead / 2'???

historical reasons -- it used to be 1/4, not 1/2. I went back to 1/2 after 
switching to the new read caching strategy; previously i was seeing stalls when 
replaying and eg cutting at the same time. With the new code hopefully 1/2 is 
enough and reduces the number of fadvise calls.

> +           // The above fadvise() works when writing slowly (recording), but could
> +           // leave cached data around when writing at a high rate (cutting).
> +           // Also, it seems in some setups, the above does not trigger any I/O and
> +           // the fdatasync() call below has to do all the work (reiserfs with some
> +           // kind of write gathering enabled).
> +           // We add 'readahead' to the threshold in an attempt to increase cutting
> +           // speed; it's a tradeoff -- speed vs RAM-used.
> +           if (totwritten > MEGABYTE(32) + readahead) {
> +              //fdatasync(fd);//XXX CLARIFY: so what, is this necessary or not???

it is not, at least not here, however i left it commented out in case there are 
setups where it actually helps -- it's easier to uncomment this one line for 
testing than to find the right place to insert the call. It's not on by default 
because fsyncs are relatively expensive; i'd rather avoid them unless absolutely 
required. (i've updated the comments in this area since, they will be in the 
merged version)

artur

[for faster responses please cc me; esp. if it's a reply to an older thread -- i 
almost missed this]
  
Klaus Schmidinger Feb. 4, 2006, 5:06 p.m. UTC | #3
Artur Skawina wrote:
> Klaus Schmidinger wrote:
> 
>> Artur Skawina wrote:
>>
>>> this time with a new approach to read caching. Should make watching 
>>> and editing recordings on a non-idle (and/or slow) machine more 
>>> comfortable.
>>>
>>> The difference to previous versions (and stock fadvise-enabled vdr) 
>>> is that previously, read data was almost immediately forgotten after 
>>> it was used; now a certain amount of recently accessed data (at most 
>>> ~16M) is kept around. This means that short seeks (jumps) when 
>>> replaying do not cause disk accesses. Things like switching play 
>>> mode, FF, setting and moving editing marks shouldn't usually block 
>>> waiting for disk IO. The changes are most noticeable when eg. several 
>>> recordings are happening in the background.
>>>
>>> I did very little testing, treat this as beta quality at best. Seems 
>>> to work ok, but i won't probably have time to test it further for a 
>>> few days; maybe somebody wants to play w/ this, or even better take a 
>>> look at the Read() path...
> 
> 
> after running w/ it for a week, i'd now say it works; haven't seen any 
> problems that could be related to the Read() code.
> 
>> Attached is the patch as I would add it for the next version.
> 
> 
> Why not the inlined Seeks()? The IO is very inefficient anyway (tiny 
> read/write sizes), having an extra syscall (seek) for every read/write 
> call doesn't help...
> (yes, it may not be pretty, but it lets the compiler optimize the calls 
> away, w/o touching any callers. Unless you fixed the callers, then 
> ignore this)

Do you really think that making Seek() inline actually makes things
faster? Can you show some hard numbers?

>> Please take a look at the lines marked CLARIFY and let me know
>> if you have any comment on these.
> 
> 
> i'll have to rediff this and compare w/ my current version; will post 
> merged version later. Comments below.
> 
>> +++ cutter.c    2006/02/04 13:40:20
>> @@ -66,6 +66,8 @@
>>       toFile = toFileName->Open();
>>       if (!fromFile || !toFile)
>>          return;
>> +     fromFile->SetReadAhead(MEGABYTE(20));
>> +     toFile->SetReadAhead(MEGABYTE(20));
>>       int Index = Mark->position;
>>       Mark = fromMarks.Next(Mark);
>>       int FileSize = 0;
>> @@ -122,6 +125,7 @@
>>                      error = "toFile 1";
>>                      break;
>>                      }
>> +                 toFile->SetReadAhead(MEGABYTE(20));
>>                   FileSize = 0;
>>                   }
>>                LastIFrame = 0;
>> @@ -162,6 +166,7 @@
>>                         error = "toFile 2";
>>                         break;
>>                         }
>> +                    toFile->SetReadAhead(MEGABYTE(20));
>>                      FileSize = 0;
>>                      }
>>                   }
> 
> 
> i've dropped the toFile->SetReadAhead() hack -- it doesn't change that 
> much when the datasyncs aren't there anyway.
> 
>> +++ tools.c    2006/02/04 13:55:24
>> +        // Read ahead:
>> +        // no jump? (allow small forward jump still inside readahead 
>> window).
>> +        if (jumped >= 0 && jumped <= (off_t)readahead) {
>> +           // Trigger the readahead IO, but only if we've used at least
>> +           // 1/2 of the previously requested area. This avoids calling
>> +           // fadvise() after every read() call.
>> +           if (ahead - curpos < (off_t)(readahead - readahead / 2)) 
>> {//XXX CLARIFY: isn't this just 'readahead / 2'???
> 
> 
> historical reasons -- it used to be 1/4, not 1/2. I went back to 1/2 
> after switching to the new read caching strategy; previously i was 
> seeing stalls when replaying and eg cutting at the same time. With the 
> new code hopefully 1/2 is enough and reduces the number of fadvise calls.

So we could make this

   if (ahead - curpos < (off_t)(readahead / 2)) {

then?

>> +           // The above fadvise() works when writing slowly 
>> (recording), but could
>> +           // leave cached data around when writing at a high rate 
>> (cutting).
>> +           // Also, it seems in some setups, the above does not 
>> trigger any I/O and
>> +           // the fdatasync() call below has to do all the work 
>> (reiserfs with some
>> +           // kind of write gathering enabled).
>> +           // We add 'readahead' to the threshold in an attempt to 
>> increase cutting
>> +           // speed; it's a tradeoff -- speed vs RAM-used.
>> +           if (totwritten > MEGABYTE(32) + readahead) {
>> +              //fdatasync(fd);//XXX CLARIFY: so what, is this 
>> necessary or not???
> 
> 
> it is not, at least not here, however i left it commented out in case 
> there are setups where it actually helps -- it's easier to uncomment 
> this one line for testing than to find the right place to insert the 
> call. It's not on by default because fsyncs are relatively expensive; 
> i'd rather avoid them unless absolutely required. (i've updated the 
> comments in this area since, they will be in the merged version)

Ok. Please make that a diff against version 1.3.41 with vdr-1.3.41-fadvise.diff
applied.

> [for faster responses please cc me; esp. if it's a reply to an older 
> thread -- i almost missed this]

Don't worry, your response was fast enough ;-)

Klaus
  
Artur Skawina Feb. 4, 2006, 5:07 p.m. UTC | #4
>  off_t cUnbufferedFile::Seek(off_t Offset, int Whence)
>  {
> -  if (fd >= 0)
> -     return lseek(fd, Offset, Whence);
> -  return -1;
> +  if (Whence == SEEK_SET && Offset == curpos)
> +     return curpos;
> +  curpos = lseek(fd, Offset, Whence);
> +  return curpos;
>  }

ups, missed this previously -- ignore my comment wrt to seeks...
A function call i can live with :)
  
Clemens Kirchgatterer Feb. 5, 2006, 7:34 a.m. UTC | #5
Artur Skawina <art_k@o2.pl> wrote:

> ups, missed this previously -- ignore my comment wrt to seeks...
> A function call i can live with :)

how could inlining help about the relative slowness of syscalls?
the slow thing about syscalls is not the function call but the jump into
kernel space. also, nobody uses syscall (2) directly anyway, instead we
have read(), write(), lseek() ... wrappers. did i miss something here?

best regards ...
clemens
  

Patch

--- cutter.c	2006/01/27 13:45:00	1.12
+++ cutter.c	2006/02/04 13:40:20
@@ -66,6 +66,8 @@ 
      toFile = toFileName->Open();
      if (!fromFile || !toFile)
         return;
+     fromFile->SetReadAhead(MEGABYTE(20));
+     toFile->SetReadAhead(MEGABYTE(20));
      int Index = Mark->position;
      Mark = fromMarks.Next(Mark);
      int FileSize = 0;
@@ -90,6 +92,7 @@ 
            if (fromIndex->Get(Index++, &FileNumber, &FileOffset, &PictureType, &Length)) {
               if (FileNumber != CurrentFileNumber) {
                  fromFile = fromFileName->SetOffset(FileNumber, FileOffset);
+                 fromFile->SetReadAhead(MEGABYTE(20));
                  CurrentFileNumber = FileNumber;
                  }
               if (fromFile) {
@@ -122,6 +125,7 @@ 
                     error = "toFile 1";
                     break;
                     }
+                 toFile->SetReadAhead(MEGABYTE(20));
                  FileSize = 0;
                  }
               LastIFrame = 0;
@@ -162,6 +166,7 @@ 
                        error = "toFile 2";
                        break;
                        }
+                    toFile->SetReadAhead(MEGABYTE(20));
                     FileSize = 0;
                     }
                  }
--- tools.h	2006/01/15 16:19:56	1.90
+++ tools.h	2006/02/04 13:58:01
@@ -242,15 +242,22 @@ 
 class cUnbufferedFile {
 private:
   int fd;
+  off_t curpos;
+  off_t cachedstart;
+  off_t cachedend;
   off_t begin;
-  off_t end;
+  off_t lastpos;
   off_t ahead;
-  ssize_t written;
+  size_t readahead;
+  size_t written;
+  size_t totwritten;
 public:
   cUnbufferedFile(void);
   ~cUnbufferedFile();
   int Open(const char *FileName, int Flags, mode_t Mode = DEFFILEMODE);
   int Close(void);
+  void SetReadAhead(size_t ra);
+  int FadviseDrop(off_t Offset, off_t Len);
   off_t Seek(off_t Offset, int Whence);
   ssize_t Read(void *Data, size_t Size);
   ssize_t Write(const void *Data, size_t Size);
--- tools.c	2006/01/20 14:01:28	1.112
+++ tools.c	2006/02/04 13:55:24
@@ -1054,10 +1054,9 @@ 
 
 // --- cUnbufferedFile -------------------------------------------------------
 
-//#define USE_FADVISE
+#define USE_FADVISE
 
-#define READ_AHEAD MEGABYTE(2)
-#define WRITE_BUFFER MEGABYTE(10)
+#define WRITE_BUFFER KILOBYTE(800)
 
 cUnbufferedFile::cUnbufferedFile(void)
 {
@@ -1073,8 +1072,17 @@ 
 {
   Close();
   fd = open(FileName, Flags, Mode);
-  begin = end = ahead = -1;
+  curpos = 0;
+#ifdef USE_FADVISE
+  begin = lastpos = ahead = 0;
+  cachedstart = 0;
+  cachedend = 0;
+  readahead = KILOBYTE(128);
   written = 0;
+  totwritten = 0;
+  if (fd >= 0)
+     posix_fadvise(fd, 0, 0, POSIX_FADV_RANDOM); // we could use POSIX_FADV_SEQUENTIAL, but we do our own readahead, disabling the kernel one.
+#endif
   return fd;
 }
 
@@ -1082,16 +1090,9 @@ 
 {
 #ifdef USE_FADVISE
   if (fd >= 0) {
-     if (ahead > end)
-        end = ahead;
-     if (begin >= 0 && end > begin) {
-        //dsyslog("close buffer: %d (flush: %d bytes, %ld-%ld)", fd, written, begin, end);
-        if (written)
-           fdatasync(fd);
-        posix_fadvise(fd, begin, end - begin, POSIX_FADV_DONTNEED);
-        }
-     begin = end = ahead = -1;
-     written = 0;
+     if (totwritten)    // if we wrote anything make sure the data has hit the disk before
+        fdatasync(fd);  // calling fadvise, as this is our last chance to un-cache it.
+     posix_fadvise(fd, 0, 0, POSIX_FADV_DONTNEED);
      }
 #endif
   int OldFd = fd;
@@ -1099,45 +1100,81 @@ 
   return close(OldFd);
 }
 
+// When replaying and going e.g. FF->PLAY the position jumps back 2..8M
+// hence we do not want to drop recently accessed data at once.
+// We try to handle the common cases such as PLAY->FF->PLAY, small
+// jumps, moving editing marks etc.
+
+#define FADVGRAN   KILOBYTE(4) // AKA fadvise-chunk-size; PAGE_SIZE or getpagesize(2) would also work.
+#define READCHUNK  MEGABYTE(8)
+
+void cUnbufferedFile::SetReadAhead(size_t ra)
+{
+  readahead = ra;
+}
+
+int cUnbufferedFile::FadviseDrop(off_t Offset, off_t Len)
+{
+  // rounding up the window to make sure that not PAGE_SIZE-aligned data gets freed.
+  return posix_fadvise(fd, Offset - (FADVGRAN - 1), Len + (FADVGRAN - 1) * 2, POSIX_FADV_DONTNEED);
+}
+
 off_t cUnbufferedFile::Seek(off_t Offset, int Whence)
 {
-  if (fd >= 0)
-     return lseek(fd, Offset, Whence);
-  return -1;
+  if (Whence == SEEK_SET && Offset == curpos)
+     return curpos;
+  curpos = lseek(fd, Offset, Whence);
+  return curpos;
 }
 
 ssize_t cUnbufferedFile::Read(void *Data, size_t Size)
 {
   if (fd >= 0) {
 #ifdef USE_FADVISE
-     off_t pos = lseek(fd, 0, SEEK_CUR);
-     // jump forward - adjust end position
-     if (pos > end)
-        end = pos;
-     // after adjusting end - don't clear more than previously requested
-     if (end > ahead)
-        end = ahead;
-     // jump backward - drop read ahead of previous run
-     if (pos < begin)
-        end = ahead;
-     if (begin >= 0 && end > begin)
-        posix_fadvise(fd, begin - KILOBYTE(200), end - begin + KILOBYTE(200), POSIX_FADV_DONTNEED);//XXX macros/parameters???
-     begin = pos;
+     off_t jumped = curpos-lastpos; // nonzero means we're not at the last offset
+     if ((cachedstart < cachedend) && (curpos < cachedstart || curpos > cachedend)) {
+        FadviseDrop(cachedstart, cachedend-cachedstart);
+        cachedstart = curpos;
+        cachedend = curpos;
+        }
+     cachedstart = min(cachedstart, curpos);
 #endif
      ssize_t bytesRead = safe_read(fd, Data, Size);
 #ifdef USE_FADVISE
      if (bytesRead > 0) {
-        pos += bytesRead;
-        end = pos;
-        // this seems to trigger a non blocking read - this
-        // may or may not have been finished when we will be called next time.
-        // If it is not finished we can't release the not yet filled buffers.
-        // So this is commented out till we find a better solution.
-        //posix_fadvise(fd, pos, READ_AHEAD, POSIX_FADV_WILLNEED);
-        ahead = pos + READ_AHEAD;
+        curpos += bytesRead;
+        cachedend = max(cachedend, curpos);
+
+        // Read ahead:
+        // no jump? (allow small forward jump still inside readahead window).
+        if (jumped >= 0 && jumped <= (off_t)readahead) {
+           // Trigger the readahead IO, but only if we've used at least
+           // 1/2 of the previously requested area. This avoids calling
+           // fadvise() after every read() call.
+           if (ahead - curpos < (off_t)(readahead - readahead / 2)) {//XXX CLARIFY: isn't this just 'readahead / 2'???
+              posix_fadvise(fd, curpos, readahead, POSIX_FADV_WILLNEED);
+              ahead = curpos + readahead;
+              cachedend = max(cachedend, ahead);
+              }
+           if (readahead < Size * 32) { // automagically tune readahead size.
+              readahead = Size * 32;
+              }
+           }
+        else
+           ahead = curpos; // jumped -> we really don't want any readahead. otherwise eg fast-rewind gets in trouble.
         }
-     else
-        end = pos;
+
+     if (cachedstart < cachedend) {
+        if (curpos - cachedstart > READCHUNK * 2) {
+           FadviseDrop(cachedstart, curpos - READCHUNK - cachedstart);
+           cachedstart = curpos - READCHUNK;
+           }
+        else if (cachedend > ahead && cachedend - curpos > READCHUNK * 2) {
+           FadviseDrop(curpos + READCHUNK, cachedend - curpos + READCHUNK);
+           cachedend = curpos + READCHUNK;
+           }
+        }
+     lastpos = curpos;
 #endif
      return bytesRead;
      }
@@ -1147,28 +1184,34 @@ 
 ssize_t cUnbufferedFile::Write(const void *Data, size_t Size)
 {
   if (fd >=0) {
-#ifdef USE_FADVISE
-     off_t pos = lseek(fd, 0, SEEK_CUR);
-#endif
      ssize_t bytesWritten = safe_write(fd, Data, Size);
 #ifdef USE_FADVISE
-     if (bytesWritten >= 0) {
+     if (bytesWritten > 0) {
+        begin = min(begin, curpos);
+        curpos += bytesWritten;
         written += bytesWritten;
-        if (begin >= 0) {
-           if (pos < begin)
-              begin = pos;
-           }
-        else
-           begin = pos;
-        if (pos + bytesWritten > end)
-           end = pos + bytesWritten;
+        lastpos = max(lastpos, curpos);
         if (written > WRITE_BUFFER) {
-           //dsyslog("flush buffer: %d (%d bytes, %ld-%ld)", fd, written, begin, end);
-           fdatasync(fd);
-           if (begin >= 0 && end > begin)
-              posix_fadvise(fd, begin, end - begin, POSIX_FADV_DONTNEED);
-           begin = end = -1;
+           if (lastpos > begin) {
+              off_t headdrop = min(begin, WRITE_BUFFER * 2L);
+              posix_fadvise(fd, begin - headdrop, lastpos - begin + headdrop, POSIX_FADV_DONTNEED);
+              }
+           begin = lastpos = max(0L, curpos - (KILOBYTE(4) - 1));
+           totwritten += written;
            written = 0;
+           // The above fadvise() works when writing slowly (recording), but could
+           // leave cached data around when writing at a high rate (cutting).
+           // Also, it seems in some setups, the above does not trigger any I/O and
+           // the fdatasync() call below has to do all the work (reiserfs with some
+           // kind of write gathering enabled).
+           // We add 'readahead' to the threshold in an attempt to increase cutting
+           // speed; it's a tradeoff -- speed vs RAM-used.
+           if (totwritten > MEGABYTE(32) + readahead) {
+              //fdatasync(fd);//XXX CLARIFY: so what, is this necessary or not???
+              off_t headdrop = min(curpos - totwritten, totwritten * 2L);
+              posix_fadvise(fd, curpos - totwritten - headdrop, totwritten + headdrop, POSIX_FADV_DONTNEED);
+              totwritten = 0;
+              }
            }
         }
 #endif