VDR-1.3.41: speedup for cVideoRepacker

Message ID 43E13D68.4060801@gmx.de
State New
Headers

Commit Message

Reinhard Nissl Feb. 1, 2006, 10:59 p.m. UTC
  Hi,

Jon Burgess wrote:

>> I don't think that it is worth a try as it tests every byte while the 
>> above code tests most of the time only every third byte.
> 
> I agree that your algorithm is clever and does greatly cut down the 
> number of comparisons as compared to the old code.
> 
> The glibc memchr() implementation does the comparisons 4 bytes at a time 
> using a clever algorithm. It also has assembler optimised variants for 
> some CPU's. I don't think that only doing a comparison of every 3rd byte 
> wins you anything over memchr().
> 
> I believe the bulk of the time taken by the routine is transferring all 
> the data from memory into the CPU. Every byte of the data will have to 
> be read into the CPU caches due to cacheline effects. I believe that the 
> asm optimisations will take into account the possibilities of 
> speculative readahead etc. I've not looked into the assembler to see 
> whether it actually exploits this.
> 
> I've atached the quickly hacked up test program that I wrote. The output 
>  is the time taken for many iterations of the 2 different algorithms. 
> For me the difference is within the measurement noise. It certainly 
> isn't any slower. I'd be interested to know whether it makes any 
> difference on your EPIA, both in the test program and in VDR.

You were right. Using memchr() reduces CPU load on my 600 MHz EPIA 
System by 1 % for channel ZDF and by 4 % for the HDTV channel HDFORUM. 
The numbers were taken by just running VDR in transfer mode for the 
mentioned channel (= no xine attached to VDR).

I also gave memmem() a try but the CPU load was increased by this change.

Attached you'll find an updated patch according to your suggestion.

Bye.
  

Patch

--- ../vdr-1.3.41-orig/remux.c	2006-01-08 12:40:16.000000000 +0100
+++ remux.c	2006-02-01 23:41:09.000000000 +0100
@@ -248,6 +313,14 @@  private:
     scanPicture
     };
   int state;
+  void HandleStartCode(const uchar *const Data, cRingBufferLinear *const ResultBuffer, const uchar *&Payload, const uchar StreamID, const ePesHeader MpegLevel);
+  inline bool ScanDataForStartCodeSlow(const uchar *const Data);
+  inline bool ScanDataForStartCodeFast(const uchar *&Data, const uchar *Limit);
+  inline bool ScanDataForStartCode(const uchar *&Data, int &Done, int &Todo);
+  inline void AdjustCounters(const int Delta, int &Done, int &Todo);
+  inline bool ScanForEndOfPictureSlow(const uchar *&Data);
+  inline bool ScanForEndOfPictureFast(const uchar *&Data, const uchar *Limit);
+  inline bool ScanForEndOfPicture(const uchar *&Data, const uchar *Limit);
 public:
   cVideoRepacker(void);
   virtual void Reset(void);
@@ -267,6 +340,162 @@  void cVideoRepacker::Reset(void)
   state = syncing;
 }
 
+void cVideoRepacker::HandleStartCode(const uchar *const Data, cRingBufferLinear *const ResultBuffer, const uchar *&Payload, const uchar StreamID, const ePesHeader MpegLevel)
+{
+  // synchronisation is detected some bytes after frame start.
+  const int SkippedBytesLimit = 4;
+
+  // which kind of start code have we got?
+  switch (*Data) {
+    case 0xB9 ... 0xFF: // system start codes
+         LOG("cVideoRepacker: found system start code: stream seems to be scrambled or not demultiplexed");
+         break;
+    case 0xB0 ... 0xB1: // reserved start codes
+    case 0xB6:
+         LOG("cVideoRepacker: found reserved start code: stream seems to be scrambled");
+         break;
+    case 0xB4: // sequence error code
+         LOG("cVideoRepacker: found sequence error code: stream seems to be damaged");
+    case 0xB2: // user data start code
+    case 0xB5: // extension start code
+         break;
+    case 0xB7: // sequence end code
+    case 0xB3: // sequence header code
+    case 0xB8: // group start code
+    case 0x00: // picture start code
+         if (state == scanPicture) {
+            // the above start codes indicate that the current picture is done. So
+            // push out the packet to start a new packet for the next picuture. If
+            // the byte count get's negative then the current buffer ends in a
+            // partitial start code that must be stripped off, as it shall be put
+            // in the next packet.
+            PushOutPacket(ResultBuffer, Payload, Data - 3 - Payload);
+            // go on with syncing to the next picture
+            state = syncing;
+            }
+         if (state == syncing) {
+            if (initiallySyncing) // omit report for the typical initial case
+               initiallySyncing = false;
+            else if (skippedBytes > SkippedBytesLimit) // report that syncing dropped some bytes
+               LOG("cVideoRepacker: skipped %d bytes to sync on next picture", skippedBytes - SkippedBytesLimit);
+            skippedBytes = 0;
+            // if there is a PES header available, then use it ...
+            if (pesHeaderBackupLen > 0) {
+               // ISO 13818-1 says:
+               // In the case of video, if a PTS is present in a PES packet header
+               // it shall refer to the access unit containing the first picture start
+               // code that commences in this PES packet. A picture start code commences
+               // in PES packet if the first byte of the picture start code is present
+               // in the PES packet.
+               memcpy(pesHeader, pesHeaderBackup, pesHeaderBackupLen);
+               pesHeaderLen = pesHeaderBackupLen;
+               pesHeaderBackupLen = 0;
+               }
+            else {
+               // ... otherwise create a continuation PES header
+               pesHeaderLen = 0;
+               pesHeader[pesHeaderLen++] = 0x00;
+               pesHeader[pesHeaderLen++] = 0x00;
+               pesHeader[pesHeaderLen++] = 0x01;
+               pesHeader[pesHeaderLen++] = StreamID; // video stream ID
+               pesHeader[pesHeaderLen++] = 0x00; // length still unknown
+               pesHeader[pesHeaderLen++] = 0x00; // length still unknown
+
+               if (MpegLevel == phMPEG2) {
+                  pesHeader[pesHeaderLen++] = 0x80;
+                  pesHeader[pesHeaderLen++] = 0x00;
+                  pesHeader[pesHeaderLen++] = 0x00;
+                  }
+               else
+                  pesHeader[pesHeaderLen++] = 0x0F;
+               }
+            // append the first three bytes of the start code
+            pesHeader[pesHeaderLen++] = 0x00;
+            pesHeader[pesHeaderLen++] = 0x00;
+            pesHeader[pesHeaderLen++] = 0x01;
+            // the next packet's payload will begin with the fourth byte of
+            // the start code (= the actual code)
+            Payload = Data;
+            // as there is no length information available, assume the
+            // maximum we can hold in one PES packet
+            packetTodo = maxPacketSize - pesHeaderLen;
+            // go on with finding the picture data
+            state++;
+            }
+         break;
+    case 0x01 ... 0xAF: // slice start codes
+         if (state == findPicture) {
+            // go on with scanning the picture data
+            state++;
+            }
+         break;
+    }
+}
+
+bool cVideoRepacker::ScanDataForStartCodeSlow(const uchar *const Data)
+{
+  scanner <<= 8;
+  bool FoundStartCode = (scanner == 0x00000100);
+  scanner |= *Data;
+  return FoundStartCode;
+}
+
+bool cVideoRepacker::ScanDataForStartCodeFast(const uchar *&Data, const uchar *Limit)
+{
+  Limit--;
+
+  while (Data < Limit && (Data = (const uchar *)memchr(Data, 0x01, Limit - Data))) {
+        if (Data[-2] || Data[-1])
+           Data += 3;
+        else {
+           scanner = 0x00000100 | *++Data;
+           return true;
+           }
+        }
+
+  Data = Limit;
+  unsigned long *Scanner = (unsigned long *)(Data - 3);
+  scanner = ntohl(*Scanner);
+  return false;
+}
+
+bool cVideoRepacker::ScanDataForStartCode(const uchar *&Data, int &Done, int &Todo)
+{
+  const uchar *const DataOrig = Data;
+  const int MinDataSize = 4;
+
+  if (Todo < MinDataSize || (state != syncing && packetTodo < MinDataSize))
+     return ScanDataForStartCodeSlow(Data);
+
+  int Limit = Todo;
+  if (state != syncing && Limit > packetTodo)
+     Limit = packetTodo;
+
+  if (ScanDataForStartCodeSlow(Data))
+     return true;
+
+  if (ScanDataForStartCodeSlow(++Data)) {
+     AdjustCounters(1, Done, Todo);
+     return true;
+     }
+  ++Data;
+ 
+  bool FoundStartCode = ScanDataForStartCodeFast(Data, DataOrig + Limit);
+  AdjustCounters(Data - DataOrig, Done, Todo);
+  return FoundStartCode;
+}
+
+void cVideoRepacker::AdjustCounters(const int Delta, int &Done, int &Todo)
+{
+  Done += Delta;
+  Todo -= Delta;
+
+  if (state <= syncing)
+     skippedBytes += Delta;
+  else
+     packetTodo -= Delta;
+}
+
 void cVideoRepacker::Repack(cRingBufferLinear *ResultBuffer, const uchar *Data, int Count)
 {
   // synchronisation is detected some bytes after frame start.
@@ -300,98 +529,9 @@  void cVideoRepacker::Repack(cRingBufferL
         if (state <= syncing)
            skippedBytes++;
         // did we reach a start code?
-        scanner <<= 8;
-        if (scanner != 0x00000100)
-           scanner |= *data;
-        else {
-           scanner |= *data;
-
-           // which kind of start code have we got?
-           switch (*data) {
-             case 0xB9 ... 0xFF: // system start codes
-                  LOG("cVideoRepacker: found system start code: stream seems to be scrambled or not demultiplexed");
-                  break;
-             case 0xB0 ... 0xB1: // reserved start codes
-             case 0xB6:
-                  LOG("cVideoRepacker: found reserved start code: stream seems to be scrambled");
-                  break;
-             case 0xB4: // sequence error code
-                  LOG("cVideoRepacker: found sequence error code: stream seems to be damaged");
-             case 0xB2: // user data start code
-             case 0xB5: // extension start code
-                  break;
-             case 0xB7: // sequence end code
-             case 0xB3: // sequence header code
-             case 0xB8: // group start code
-             case 0x00: // picture start code
-                  if (state == scanPicture) {
-                     // the above start codes indicate that the current picture is done. So
-                     // push out the packet to start a new packet for the next picuture. If
-                     // the byte count get's negative then the current buffer ends in a
-                     // partitial start code that must be stripped off, as it shall be put
-                     // in the next packet.
-                     PushOutPacket(ResultBuffer, payload, data - 3 - payload);
-                     // go on with syncing to the next picture
-                     state = syncing;
-                     }
-                  if (state == syncing) {
-                     if (initiallySyncing) // omit report for the typical initial case
-                        initiallySyncing = false;
-                     else if (skippedBytes > SkippedBytesLimit) // report that syncing dropped some bytes
-                        LOG("cVideoRepacker: skipped %d bytes to sync on next picture", skippedBytes - SkippedBytesLimit);
-                     skippedBytes = 0;
-                     // if there is a PES header available, then use it ...
-                     if (pesHeaderBackupLen > 0) {
-                        // ISO 13818-1 says:
-                        // In the case of video, if a PTS is present in a PES packet header
-                        // it shall refer to the access unit containing the first picture start
-                        // code that commences in this PES packet. A picture start code commences
-                        // in PES packet if the first byte of the picture start code is present
-                        // in the PES packet.
-                        memcpy(pesHeader, pesHeaderBackup, pesHeaderBackupLen);
-                        pesHeaderLen = pesHeaderBackupLen;
-                        pesHeaderBackupLen = 0;
-                        }
-                     else {
-                        // ... otherwise create a continuation PES header
-                        pesHeaderLen = 0;
-                        pesHeader[pesHeaderLen++] = 0x00;
-                        pesHeader[pesHeaderLen++] = 0x00;
-                        pesHeader[pesHeaderLen++] = 0x01;
-                        pesHeader[pesHeaderLen++] = Data[3]; // video stream ID
-                        pesHeader[pesHeaderLen++] = 0x00; // length still unknown
-                        pesHeader[pesHeaderLen++] = 0x00; // length still unknown
-
-                        if (mpegLevel == phMPEG2) {
-                           pesHeader[pesHeaderLen++] = 0x80;
-                           pesHeader[pesHeaderLen++] = 0x00;
-                           pesHeader[pesHeaderLen++] = 0x00;
-                           }
-                        else
-                           pesHeader[pesHeaderLen++] = 0x0F;
-                        }
-                     // append the first three bytes of the start code
-                     pesHeader[pesHeaderLen++] = 0x00;
-                     pesHeader[pesHeaderLen++] = 0x00;
-                     pesHeader[pesHeaderLen++] = 0x01;
-                     // the next packet's payload will begin with the fourth byte of
-                     // the start code (= the actual code)
-                     payload = data;
-                     // as there is no length information available, assume the
-                     // maximum we can hold in one PES packet
-                     packetTodo = maxPacketSize - pesHeaderLen;
-                     // go on with finding the picture data
-                     state++;
-                     }
-                  break;
-             case 0x01 ... 0xAF: // slice start codes
-                  if (state == findPicture) {
-                     // go on with scanning the picture data
-                     state++;
-                     }
-                  break;
-             }
-           }
+        if (ScanDataForStartCode(data, done, todo))
+           HandleStartCode(data, ResultBuffer, payload, Data[3], mpegLevel);
+        // move on
         data++;
         done++;
         todo--;
@@ -501,6 +641,79 @@  void cVideoRepacker::Repack(cRingBufferL
      }
 }
 
+bool cVideoRepacker::ScanForEndOfPictureSlow(const uchar *&Data)
+{
+  localScanner <<= 8;
+  localScanner |= *Data++;
+  // check start codes which follow picture data
+  switch (localScanner) {
+    case 0x00000100: // picture start code
+    case 0x000001B8: // group start code
+    case 0x000001B3: // sequence header code
+    case 0x000001B7: // sequence end code
+         return true;
+    }
+  return false;
+}
+
+bool cVideoRepacker::ScanForEndOfPictureFast(const uchar *&Data, const uchar *Limit)
+{
+  Limit--;
+
+  while (Data < Limit && (Data = (const uchar *)memchr(Data, 0x01, Limit - Data))) {
+        if (Data[-2] || Data[-1])
+           Data += 3;
+        else {
+           localScanner = 0x00000100 | *++Data;
+           // check start codes which follow picture data
+           switch (localScanner) {
+             case 0x00000100: // picture start code
+             case 0x000001B8: // group start code
+             case 0x000001B3: // sequence header code
+             case 0x000001B7: // sequence end code
+                  Data++;
+                  return true;
+             default:
+                  Data += 3;
+             }
+           }
+        }
+
+  Data = Limit + 1;
+  unsigned long *LocalScanner = (unsigned long *)(Data - 4);
+  localScanner = ntohl(*LocalScanner);
+  return false;
+}
+
+bool cVideoRepacker::ScanForEndOfPicture(const uchar *&Data, const uchar *Limit)
+{
+  const uchar *const DataOrig = Data;
+  const int MinDataSize = 4;
+  bool FoundEndOfPicture;
+
+  if (Limit - Data <= MinDataSize) {
+     FoundEndOfPicture = false;
+     while (Data < Limit) {
+           if (ScanForEndOfPictureSlow(Data)) {
+              FoundEndOfPicture = true;
+              break;
+              }
+           }
+     }
+  else {
+     FoundEndOfPicture = true;
+     if (!ScanForEndOfPictureSlow(Data)) {
+        if (!ScanForEndOfPictureSlow(Data)) {
+           if (!ScanForEndOfPictureFast(Data, Limit))
+              FoundEndOfPicture = false; 
+           }
+        }
+     }
+
+  localStart += (Data - DataOrig);
+  return FoundEndOfPicture;
+}
+
 int cVideoRepacker::BreakAt(const uchar *Data, int Count)
 {
   if (initiallySyncing)
@@ -522,19 +735,8 @@  int cVideoRepacker::BreakAt(const uchar 
      const uchar *data = Data + PesPayloadOffset + localStart;
      const uchar *limit = Data + Count;
      // scan data
-     while (data < limit) {
-           localStart++;
-           localScanner <<= 8;
-           localScanner |= *data++;
-           // check start codes which follow picture data
-           switch (localScanner) {
-             case 0x00000100: // picture start code
-             case 0x000001B8: // group start code
-             case 0x000001B3: // sequence header code
-             case 0x000001B7: // sequence end code
-                  return data - Data;
-             }
-           }
+     if (ScanForEndOfPicture(data, limit))
+        return data - Data;
      }
   // just fill up packet and append next start code
   return PesPayloadOffset + packetTodo + 4;