VDR-1.3.41: speedup for cVideoRepacker

Message ID	43DD4F85.7060500@gmx.de
State	New
Headers	Message-ID: <43DD4F85.7060500@gmx.de> Date: Mon, 30 Jan 2006 00:28:05 +0100 From: Reinhard Nissl <rnissl@gmx.de> User-Agent: Thunderbird 1.5 (X11/20051201) MIME-Version: 1.0 To: VDR Mailing List <vdr@linuxtv.org> Content-Type: multipart/mixed; boundary="------------050400070609080906080003" Subject: [vdr] VDR-1.3.41: speedup for cVideoRepacker Precedence: list Reply-To: VDR Mailing List <vdr@linuxtv.org> Status: O

Message ID

43DD4F85.7060500@gmx.de

State

New

Headers

Message-ID: <43DD4F85.7060500@gmx.de>
Date: Mon, 30 Jan 2006 00:28:05 +0100
From: Reinhard Nissl <rnissl@gmx.de>
User-Agent: Thunderbird 1.5 (X11/20051201)
MIME-Version: 1.0
To: VDR Mailing List <vdr@linuxtv.org>
Content-Type: multipart/mixed;
	boundary="------------050400070609080906080003"
Subject: [vdr] VDR-1.3.41: speedup for cVideoRepacker
Precedence: list
Reply-To: VDR Mailing List <vdr@linuxtv.org>
Status: O

Commit Message

Reinhard Nissl Jan. 29, 2006, 11:28 p.m. UTC

Hi,

as you may have noticed: ZDF increased it's bitrate to about 7.5 Mbit/s. 
As a result, less powerful VDR systems "suffered" when taking a 
recording or when running in transfer mode.

The attached patch speeds up cVideoRepacker by choosing a known art 
search algorithm for finding start codes in the video stream.

With this patch and a similar one for xine-lib I was finally able to 
watch and record ZDF at the same time on my 600 MHz "budget only" EPIA 
VDR system.

Bye.

Comments

Jon Burgess Jan. 31, 2006, 10:13 p.m. UTC | #1

Reinhard Nissl wrote:
> +bool cVideoRepacker::ScanDataForStartCodeFast(const uchar *&Data, const uchar *Limit)
> +{
> +  Limit--;
> +
> +  while (Data < Limit) {
> +        if (*Data > 0x01)
> +           Data += 3;
> +        else if (*Data == 0x00)
> +           Data++;
> +        else if (Data[-2] != 0x00 || Data[-1] != 0x00)
> +           Data += 3;
> +        else {
> +           scanner = 0x00000100 | *++Data;
> +           return true;
> +           }
> +        }

Did you consider using memchr()? e.g. something like
...
   while (Data < Limit) {
         Data = memchr(Data, 0x01, Limit - Data);
         if (Data == NULL)
            break;
         if (Data[-2] != 0x00 || Data[-1] != 0x00)
            Data += 3;
...

It makes no noticeable difference on my AMD64 machine (<1%), but maybe 
it is worth trying on your EPIA?

	Jon

Reinhard Nissl Jan. 31, 2006, 11:49 p.m. UTC | #2

Hi,

Jon Burgess wrote:

>> +bool cVideoRepacker::ScanDataForStartCodeFast(const uchar *&Data, 
>> const uchar *Limit)
>> +{
>> +  Limit--;
>> +
>> +  while (Data < Limit) {
>> +        if (*Data > 0x01)
>> +           Data += 3;
>> +        else if (*Data == 0x00)
>> +           Data++;
>> +        else if (Data[-2] != 0x00 || Data[-1] != 0x00)
>> +           Data += 3;
>> +        else {
>> +           scanner = 0x00000100 | *++Data;
>> +           return true;
>> +           }
>> +        }
> 
> Did you consider using memchr()? e.g. something like
> ...
>   while (Data < Limit) {
>         Data = memchr(Data, 0x01, Limit - Data);
>         if (Data == NULL)
>            break;
>         if (Data[-2] != 0x00 || Data[-1] != 0x00)
>            Data += 3;
> ...
> 
> It makes no noticeable difference on my AMD64 machine (<1%), but maybe 
> it is worth trying on your EPIA?

I don't think that it is worth a try as it tests every byte while the 
above code tests most of the time only every third byte.

Consider the following data:

hh ii jj kk [ 00 00 01 01 ] mm nn oo 01 01 pp 01
^^^^^^^^^^^^^^^^^^^    ++++++++++^^^   ^   ^^
                   6                4   0    1

which will give you the above distance between consecutive 01 which are 
not part of a startcode (i. e. first slice start code [ 00 00 01 01 ]) 
or immediately following a startcode (the area marked with +++++). ++ or 
^^ indicate bytes which have to be considered for the distance.

For the above case, a statistical analysis will give this numbers:

distance: 0, count: 1
distance: 1, count: 1
distance: 2, count: 0
distance: 3, count: 0
distance: 4, count: 1
distance: 5, count: 0
distance: 6, count: 1
distance: >, count: 0

Now consider real data, like 001.vdr of "Wetten, dass ..?" which was 
broadcast last Saturday on ZDF:

distance:  0, count:    75248
distance:  1, count:   519949
distance:  2, count:   316632
distance:  3, count:   331874
distance:  4, count:   381855
distance:  5, count:   367280
distance:  6, count:   370620
distance:  7, count:   369649
distance:  8, count:   405861
distance:  9, count:   360555
distance: 10, count:   332593
distance: 11, count:   366192
distance: 12, count:   319692
distance: 13, count:   313795
distance: 14, count:   309567
distance: 15, count:   305986
distance: 16, count:   297510
distance: 17, count:   291607
distance: 18, count:   286045
distance:  >, count: 18794252

As you can easily see, it's a waste of time to test every byte (= 
distance 0) for 01 as it is very unlikely to find one.

Bye.

Jon Burgess Feb. 1, 2006, 12:33 a.m. UTC | #3

Reinhard Nissl wrote:
> I don't think that it is worth a try as it tests every byte while the 
> above code tests most of the time only every third byte.

I agree that your algorithm is clever and does greatly cut down the 
number of comparisons as compared to the old code.

The glibc memchr() implementation does the comparisons 4 bytes at a time 
using a clever algorithm. It also has assembler optimised variants for 
some CPU's. I don't think that only doing a comparison of every 3rd byte 
wins you anything over memchr().

I believe the bulk of the time taken by the routine is transferring all 
the data from memory into the CPU. Every byte of the data will have to 
be read into the CPU caches due to cacheline effects. I believe that the 
asm optimisations will take into account the possibilities of 
speculative readahead etc. I've not looked into the assembler to see 
whether it actually exploits this.

I've atached the quickly hacked up test program that I wrote. The output 
  is the time taken for many iterations of the 2 different algorithms. 
For me the difference is within the measurement noise. It certainly 
isn't any slower. I'd be interested to know whether it makes any 
difference on your EPIA, both in the test program and in VDR.

$ ./search /video0/%Click_Online/2005-04-10.04\:28.99.99.rec/001.vdr
Found 10585344 matches in 12.5873 seconds
Found 10585344 matches in 12.6235 seconds

	Jon

Laz Feb. 1, 2006, 9:54 p.m. UTC | #4

On Wednesday 01 February 2006 00:33, Jon Burgess wrote:
> Reinhard Nissl wrote:
> > I don't think that it is worth a try as it tests every byte while the
> > above code tests most of the time only every third byte.
>
> I agree that your algorithm is clever and does greatly cut down the
> number of comparisons as compared to the old code.
>
> The glibc memchr() implementation does the comparisons 4 bytes at a time
> using a clever algorithm. It also has assembler optimised variants for
> some CPU's. I don't think that only doing a comparison of every 3rd byte
> wins you anything over memchr().
>
> I believe the bulk of the time taken by the routine is transferring all
> the data from memory into the CPU. Every byte of the data will have to
> be read into the CPU caches due to cacheline effects. I believe that the
> asm optimisations will take into account the possibilities of
> speculative readahead etc. I've not looked into the assembler to see
> whether it actually exploits this.
>
> I've atached the quickly hacked up test program that I wrote. The output
>   is the time taken for many iterations of the 2 different algorithms.
> For me the difference is within the measurement noise. It certainly
> isn't any slower. I'd be interested to know whether it makes any
> difference on your EPIA, both in the test program and in VDR.
>
> $ ./search /video0/%Click_Online/2005-04-10.04\:28.99.99.rec/001.vdr
> Found 10585344 matches in 12.5873 seconds
> Found 10585344 matches in 12.6235 seconds

Just done a quick test on my Epia MII-12000 system (whilst under heavy load 
running softdevice, etc. so the numbers are probably crap!):

laz@vdr-tng 2006-02-01.18.28.50.50.del $ ~/stmp 001.vdr
Found 12721024 matches in 276.0596 seconds
Found 12721024 matches in 201.9245 seconds

I'll try to test it again later when it has finished recording so it is more 
idle! With the above numbers, it looks like the memchr() is quite a bit 
faster, at least under these conditions!

Cheers,

Laz

Cheers,

Laz

Laz Feb. 1, 2006, 11:53 p.m. UTC | #5

On Wednesday 01 February 2006 21:54, Laz wrote:
> Just done a quick test on my Epia MII-12000 system (whilst under heavy load
> running softdevice, etc. so the numbers are probably crap!):
>
> laz@vdr-tng 2006-02-01.18.28.50.50.del $ ~/stmp 001.vdr
> Found 12721024 matches in 276.0596 seconds
> Found 12721024 matches in 201.9245 seconds
>
> I'll try to test it again later when it has finished recording so it is
> more idle! With the above numbers, it looks like the memchr() is quite a
> bit faster, at least under these conditions!

And not under load:
laz@vdr-tng 2006-02-01.18.28.50.50.del $ ~/stmp 001.vdr
Found 12721024 matches in 47.0021 seconds
Found 12721024 matches in 32.8197 seconds

memchr() definitely looks better for my Epia system.

I'll try Reinhard's new patch next...

Cheers,

Laz

diff mbox

Patch

--- ../vdr-1.3.41-orig/remux.c	2006-01-08 12:40:16.000000000 +0100
+++ remux.c	2006-01-29 23:51:06.000000000 +0100
@@ -248,6 +313,14 @@  private:
     scanPicture
     };
   int state;
+  void HandleStartCode(const uchar *const Data, cRingBufferLinear *const ResultBuffer, const uchar *&Payload, const uchar StreamID, const ePesHeader MpegLevel);
+  inline bool ScanDataForStartCodeSlow(const uchar *const Data);
+  inline bool ScanDataForStartCodeFast(const uchar *&Data, const uchar *Limit);
+  inline bool ScanDataForStartCode(const uchar *&Data, int &Done, int &Todo);
+  inline void AdjustCounters(const int Delta, int &Done, int &Todo);
+  inline bool ScanForEndOfPictureSlow(const uchar *&Data);
+  inline bool ScanForEndOfPictureFast(const uchar *&Data, const uchar *Limit);
+  inline bool ScanForEndOfPicture(const uchar *&Data, const uchar *Limit);
 public:
   cVideoRepacker(void);
   virtual void Reset(void);
@@ -267,6 +340,166 @@  void cVideoRepacker::Reset(void)
   state = syncing;
 }
 
+void cVideoRepacker::HandleStartCode(const uchar *const Data, cRingBufferLinear *const ResultBuffer, const uchar *&Payload, const uchar StreamID, const ePesHeader MpegLevel)
+{
+  // synchronisation is detected some bytes after frame start.
+  const int SkippedBytesLimit = 4;
+
+  // which kind of start code have we got?
+  switch (*Data) {
+    case 0xB9 ... 0xFF: // system start codes
+         LOG("cVideoRepacker: found system start code: stream seems to be scrambled or not demultiplexed");
+         break;
+    case 0xB0 ... 0xB1: // reserved start codes
+    case 0xB6:
+         LOG("cVideoRepacker: found reserved start code: stream seems to be scrambled");
+         break;
+    case 0xB4: // sequence error code
+         LOG("cVideoRepacker: found sequence error code: stream seems to be damaged");
+    case 0xB2: // user data start code
+    case 0xB5: // extension start code
+         break;
+    case 0xB7: // sequence end code
+    case 0xB3: // sequence header code
+    case 0xB8: // group start code
+    case 0x00: // picture start code
+         if (state == scanPicture) {
+            // the above start codes indicate that the current picture is done. So
+            // push out the packet to start a new packet for the next picuture. If
+            // the byte count get's negative then the current buffer ends in a
+            // partitial start code that must be stripped off, as it shall be put
+            // in the next packet.
+            PushOutPacket(ResultBuffer, Payload, Data - 3 - Payload);
+            // go on with syncing to the next picture
+            state = syncing;
+            }
+         if (state == syncing) {
+            if (initiallySyncing) // omit report for the typical initial case
+               initiallySyncing = false;
+            else if (skippedBytes > SkippedBytesLimit) // report that syncing dropped some bytes
+               LOG("cVideoRepacker: skipped %d bytes to sync on next picture", skippedBytes - SkippedBytesLimit);
+            skippedBytes = 0;
+            // if there is a PES header available, then use it ...
+            if (pesHeaderBackupLen > 0) {
+               // ISO 13818-1 says:
+               // In the case of video, if a PTS is present in a PES packet header
+               // it shall refer to the access unit containing the first picture start
+               // code that commences in this PES packet. A picture start code commences
+               // in PES packet if the first byte of the picture start code is present
+               // in the PES packet.
+               memcpy(pesHeader, pesHeaderBackup, pesHeaderBackupLen);
+               pesHeaderLen = pesHeaderBackupLen;
+               pesHeaderBackupLen = 0;
+               }
+            else {
+               // ... otherwise create a continuation PES header
+               pesHeaderLen = 0;
+               pesHeader[pesHeaderLen++] = 0x00;
+               pesHeader[pesHeaderLen++] = 0x00;
+               pesHeader[pesHeaderLen++] = 0x01;
+               pesHeader[pesHeaderLen++] = StreamID; // video stream ID
+               pesHeader[pesHeaderLen++] = 0x00; // length still unknown
+               pesHeader[pesHeaderLen++] = 0x00; // length still unknown
+
+               if (MpegLevel == phMPEG2) {
+                  pesHeader[pesHeaderLen++] = 0x80;
+                  pesHeader[pesHeaderLen++] = 0x00;
+                  pesHeader[pesHeaderLen++] = 0x00;
+                  }
+               else
+                  pesHeader[pesHeaderLen++] = 0x0F;
+               }
+            // append the first three bytes of the start code
+            pesHeader[pesHeaderLen++] = 0x00;
+            pesHeader[pesHeaderLen++] = 0x00;
+            pesHeader[pesHeaderLen++] = 0x01;
+            // the next packet's payload will begin with the fourth byte of
+            // the start code (= the actual code)
+            Payload = Data;
+            // as there is no length information available, assume the
+            // maximum we can hold in one PES packet
+            packetTodo = maxPacketSize - pesHeaderLen;
+            // go on with finding the picture data
+            state++;
+            }
+         break;
+    case 0x01 ... 0xAF: // slice start codes
+         if (state == findPicture) {
+            // go on with scanning the picture data
+            state++;
+            }
+         break;
+    }
+}
+
+bool cVideoRepacker::ScanDataForStartCodeSlow(const uchar *const Data)
+{
+  scanner <<= 8;
+  bool FoundStartCode = (scanner == 0x00000100);
+  scanner |= *Data;
+  return FoundStartCode;
+}
+
+bool cVideoRepacker::ScanDataForStartCodeFast(const uchar *&Data, const uchar *Limit)
+{
+  Limit--;
+
+  while (Data < Limit) {
+        if (*Data > 0x01)
+           Data += 3;
+        else if (*Data == 0x00)
+           Data++;
+        else if (Data[-2] != 0x00 || Data[-1] != 0x00)
+           Data += 3;
+        else {
+           scanner = 0x00000100 | *++Data;
+           return true;
+           }
+        }
+
+  Data = Limit;
+  unsigned long *Scanner = (unsigned long *)(Data - 3);
+  scanner = ntohl(*Scanner);
+  return false;
+}
+
+bool cVideoRepacker::ScanDataForStartCode(const uchar *&Data, int &Done, int &Todo)
+{
+  const uchar *const DataOrig = Data;
+  const int MinDataSize = 4;
+
+  if (Todo < MinDataSize || (state != syncing && packetTodo < MinDataSize))
+     return ScanDataForStartCodeSlow(Data);
+
+  int Limit = Todo;
+  if (state != syncing && Limit > packetTodo)
+     Limit = packetTodo;
+
+  if (ScanDataForStartCodeSlow(Data))
+     return true;
+
+  if (ScanDataForStartCodeSlow(++Data)) {
+     AdjustCounters(1, Done, Todo);
+     return true;
+     }
+  ++Data;
+ 
+  bool FoundStartCode = ScanDataForStartCodeFast(Data, DataOrig + Limit);
+  AdjustCounters(Data - DataOrig, Done, Todo);
+  return FoundStartCode;
+}
+
+void cVideoRepacker::AdjustCounters(const int Delta, int &Done, int &Todo)
+{
+  Done += Delta;
+  Todo -= Delta;
+
+  if (state <= syncing)
+     skippedBytes += Delta;
+  else
+     packetTodo -= Delta;
+}
+
 void cVideoRepacker::Repack(cRingBufferLinear *ResultBuffer, const uchar *Data, int Count)
 {
   // synchronisation is detected some bytes after frame start.
@@ -300,98 +533,9 @@  void cVideoRepacker::Repack(cRingBufferL
         if (state <= syncing)
            skippedBytes++;
         // did we reach a start code?
-        scanner <<= 8;
-        if (scanner != 0x00000100)
-           scanner |= *data;
-        else {
-           scanner |= *data;
-
-           // which kind of start code have we got?
-           switch (*data) {
-             case 0xB9 ... 0xFF: // system start codes
-                  LOG("cVideoRepacker: found system start code: stream seems to be scrambled or not demultiplexed");
-                  break;
-             case 0xB0 ... 0xB1: // reserved start codes
-             case 0xB6:
-                  LOG("cVideoRepacker: found reserved start code: stream seems to be scrambled");
-                  break;
-             case 0xB4: // sequence error code
-                  LOG("cVideoRepacker: found sequence error code: stream seems to be damaged");
-             case 0xB2: // user data start code
-             case 0xB5: // extension start code
-                  break;
-             case 0xB7: // sequence end code
-             case 0xB3: // sequence header code
-             case 0xB8: // group start code
-             case 0x00: // picture start code
-                  if (state == scanPicture) {
-                     // the above start codes indicate that the current picture is done. So
-                     // push out the packet to start a new packet for the next picuture. If
-                     // the byte count get's negative then the current buffer ends in a
-                     // partitial start code that must be stripped off, as it shall be put
-                     // in the next packet.
-                     PushOutPacket(ResultBuffer, payload, data - 3 - payload);
-                     // go on with syncing to the next picture
-                     state = syncing;
-                     }
-                  if (state == syncing) {
-                     if (initiallySyncing) // omit report for the typical initial case
-                        initiallySyncing = false;
-                     else if (skippedBytes > SkippedBytesLimit) // report that syncing dropped some bytes
-                        LOG("cVideoRepacker: skipped %d bytes to sync on next picture", skippedBytes - SkippedBytesLimit);
-                     skippedBytes = 0;
-                     // if there is a PES header available, then use it ...
-                     if (pesHeaderBackupLen > 0) {
-                        // ISO 13818-1 says:
-                        // In the case of video, if a PTS is present in a PES packet header
-                        // it shall refer to the access unit containing the first picture start
-                        // code that commences in this PES packet. A picture start code commences
-                        // in PES packet if the first byte of the picture start code is present
-                        // in the PES packet.
-                        memcpy(pesHeader, pesHeaderBackup, pesHeaderBackupLen);
-                        pesHeaderLen = pesHeaderBackupLen;
-                        pesHeaderBackupLen = 0;
-                        }
-                     else {
-                        // ... otherwise create a continuation PES header
-                        pesHeaderLen = 0;
-                        pesHeader[pesHeaderLen++] = 0x00;
-                        pesHeader[pesHeaderLen++] = 0x00;
-                        pesHeader[pesHeaderLen++] = 0x01;
-                        pesHeader[pesHeaderLen++] = Data[3]; // video stream ID
-                        pesHeader[pesHeaderLen++] = 0x00; // length still unknown
-                        pesHeader[pesHeaderLen++] = 0x00; // length still unknown
-
-                        if (mpegLevel == phMPEG2) {
-                           pesHeader[pesHeaderLen++] = 0x80;
-                           pesHeader[pesHeaderLen++] = 0x00;
-                           pesHeader[pesHeaderLen++] = 0x00;
-                           }
-                        else
-                           pesHeader[pesHeaderLen++] = 0x0F;
-                        }
-                     // append the first three bytes of the start code
-                     pesHeader[pesHeaderLen++] = 0x00;
-                     pesHeader[pesHeaderLen++] = 0x00;
-                     pesHeader[pesHeaderLen++] = 0x01;
-                     // the next packet's payload will begin with the fourth byte of
-                     // the start code (= the actual code)
-                     payload = data;
-                     // as there is no length information available, assume the
-                     // maximum we can hold in one PES packet
-                     packetTodo = maxPacketSize - pesHeaderLen;
-                     // go on with finding the picture data
-                     state++;
-                     }
-                  break;
-             case 0x01 ... 0xAF: // slice start codes
-                  if (state == findPicture) {
-                     // go on with scanning the picture data
-                     state++;
-                     }
-                  break;
-             }
-           }
+        if (ScanDataForStartCode(data, done, todo))
+           HandleStartCode(data, ResultBuffer, payload, Data[3], mpegLevel);
+        // move on
         data++;
         done++;
         todo--;
@@ -501,6 +645,83 @@  void cVideoRepacker::Repack(cRingBufferL
      }
 }
 
+bool cVideoRepacker::ScanForEndOfPictureSlow(const uchar *&Data)
+{
+  localScanner <<= 8;
+  localScanner |= *Data++;
+  // check start codes which follow picture data
+  switch (localScanner) {
+    case 0x00000100: // picture start code
+    case 0x000001B8: // group start code
+    case 0x000001B3: // sequence header code
+    case 0x000001B7: // sequence end code
+         return true;
+    }
+  return false;
+}
+
+bool cVideoRepacker::ScanForEndOfPictureFast(const uchar *&Data, const uchar *Limit)
+{
+  Limit--;
+
+  while (Data < Limit) {
+        if (*Data > 0x01)
+           Data += 3;
+        else if (*Data == 0x00)
+           Data++;
+        else if (Data[-2] != 0x00 || Data[-1] != 0x00)
+           Data += 3;
+        else {
+           localScanner = 0x00000100 | *++Data;
+           // check start codes which follow picture data
+           switch (localScanner) {
+             case 0x00000100: // picture start code
+             case 0x000001B8: // group start code
+             case 0x000001B3: // sequence header code
+             case 0x000001B7: // sequence end code
+                  Data++;
+                  return true;
+             default:
+                  Data += 3;
+             }
+           }
+        }
+
+  Data = Limit + 1;
+  unsigned long *LocalScanner = (unsigned long *)(Data - 4);
+  localScanner = ntohl(*LocalScanner);
+  return false;
+}
+
+bool cVideoRepacker::ScanForEndOfPicture(const uchar *&Data, const uchar *Limit)
+{
+  const uchar *const DataOrig = Data;
+  const int MinDataSize = 4;
+  bool FoundEndOfPicture;
+
+  if (Limit - Data <= MinDataSize) {
+     FoundEndOfPicture = false;
+     while (Data < Limit) {
+           if (ScanForEndOfPictureSlow(Data)) {
+              FoundEndOfPicture = true;
+              break;
+              }
+           }
+     }
+  else {
+     FoundEndOfPicture = true;
+     if (!ScanForEndOfPictureSlow(Data)) {
+        if (!ScanForEndOfPictureSlow(Data)) {
+           if (!ScanForEndOfPictureFast(Data, Limit))
+              FoundEndOfPicture = false; 
+           }
+        }
+     }
+
+  localStart += (Data - DataOrig);
+  return FoundEndOfPicture;
+}
+
 int cVideoRepacker::BreakAt(const uchar *Data, int Count)
 {
   if (initiallySyncing)
@@ -522,19 +743,8 @@  int cVideoRepacker::BreakAt(const uchar 
      const uchar *data = Data + PesPayloadOffset + localStart;
      const uchar *limit = Data + Count;
      // scan data
-     while (data < limit) {
-           localStart++;
-           localScanner <<= 8;
-           localScanner |= *data++;
-           // check start codes which follow picture data
-           switch (localScanner) {
-             case 0x00000100: // picture start code
-             case 0x000001B8: // group start code
-             case 0x000001B3: // sequence header code
-             case 0x000001B7: // sequence end code
-                  return data - Data;
-             }
-           }
+     if (ScanForEndOfPicture(data, limit))
+        return data - Data;
      }
   // just fill up packet and append next start code
   return PesPayloadOffset + packetTodo + 4;