Novalight - very fast tape loading

NekoNoNiaow · Post by **NekoNoNiaow** » Tue Apr 02, 2019 2:45 am

Symoon wrote: ↑Sun Mar 31, 2019 5:46 pm To be honest, I tried all the solutions we talked about and always found a program which ended being slower
So after a full day/night spent on it for a difference of something like 0.005s, I ended up choosing to keep a simple code and make a break with this - really needed to.

I'm working on another optimization which, should it work, may save (I hope) about 1 second for Zorgons Revenge. And it changes the dictionary too :p

Yup, there will always be a program which makes your heuristic wrong since the structure of data in the file ultimately determines what is the best dictionary.

I have been working on writing a combinatorial exploration algorithm with a user specified coverage parameter:

1 = simply take the best candidate at each round, that is explore only one branch of the combinatorial tree (1 over n! possible choices).
This is simply the greedy strategy I initially recommended.
2 = explore two branches of the tree, that is explore n!/(n-2)! choices (for n = 100 that is 100 * 99 = 9900 explorations.
3 = explore three branches of the tree, n! / (n-3)! choices (for n = 100, = 100 * 99 * 98 = 970200)
...
n = explore n branches of the tree : n! choices (for n=100 that's 9.33e+157 choices )

I will add it to the thread when I am done but it really is not that complicated.

Note that exploring the entire tree will be impossible obviously but that is fine:
playing with the Google sheet I posted some time ago shows that in general, exploring two branches is very often enough to select quite good choices. The sheet generates random data every time so it is very quick to have a look at the graph and get a grasp of how deep one generally needs to go.
Moreover, these numbers are probably worse than actual Oric data since I use a completely random distribution whereas structured data such as Oric programs should probably observe a Normal distribution and thus offer simpler choices.

I would expect that going three or four levels deep would give very good results.
I will post that soon.

NekoNoNiaow · Post by **NekoNoNiaow** » Wed Apr 03, 2019 5:51 am

So, here is the exhaustive exploration algorithm.

Note that this is:

a mix of c and c++ idioms
untested, just written in an editor and a sheet of paper

and thus is bound to be buggy if not entirely algorithmically faulty.

Code: Select all

struct S_Element {
  int id;   // uniquely identifies this element
  int freq;
  int size;
};
struct S_Examined {
  int id;   // comes from S_Element.id, is used to identify S_Element-s.
  int gain;
};
struct S_Chosen {
  int totalgain;
  std::list<S_Examined> elements;
};

// Ugly globals.
int DictionarySizes[] = { 4, 6, 9, 14, 21, 32, 48, 72 }; // replace with real values.
int DictCount         = 8;                               // replace with real values.
int UsedDictEntries   = 0;

int main(argc, argv)
{
  int maxWidth = atoi( argv[1] );  // run as "carrot.exe <width>"

  std::vector<S_Element> candidates;
  candidates.push_back( { /* A */ 0, 800,  50, /*bytes...*/ } ); // invalid syntax but you get the idea
  candidates.push_back( { /* B */ 1, 390, 180, /*bytes...*/ } );
  candidates.push_back( { /* C */ 2, 280, 160, /*bytes...*/ } );
  candidates.push_back( { /* D */ 3, 430,  10, /*bytes...*/ } );
  // and so on...

  if (maxWidth > candidates.size())
  {
    maxWidth = candidates.size(); // Do not explore more choices than exist.
  }
  if (maxWidth > DictCount)
  {
    maxWidth = DictCount; // No need to return more results than dictionary entries.
  }

  S_Chosen chosen = ChooseElements( DictionarySizes, &candidates[0], candidates.size(), maxWidth );

  double examinedCount = Factorial(maxWidth);
  examinedCount *= examinedCount; // Because we are re-examining all entries at each step.
  double totalCount = Factorial(candidates.size());
  totalCount *= totalCount;
  printf("Examined %.2f combinations out of a total of %.2f.\n", examinedCount);
  printf("Size reduction obtained: %d\n", chosen.totalgain);
  printf("Fill your dictionary with (in order):\n");
  for (auto it = chosen.elements.begin() ; *it ; ++it )
  {
    printf("id %d\n", (*it).id);
  }
}

long Factorial(int n)
{
  // Note: this might overflow, do not use too large values of width.
  double top = 1.f;
  for (int i = 2 ; i <= n ; i++ )
  {
    top *= (double)i;
  }

  return top;
}

bool CompareElements(S_Examined const& a, S_Examined const& b)
{
  return a.gain < b.gain; // Sort by increasing gain.
}

S_Chosen ChooseElements( const int* const dict, const S_Element* const candidates, int count, int width )
{
  S_Chosen chosen; // Will be returned with RVO so declared first is better.
  chosen.totalgain = ~0; // largest negative number, can also use std::limits<int>::Min() or something like that.

  if (count == 0 || width == 0)
  {
    chosen.totalgain = 0; // No candidates or dictionary entries, return an empty list.
    return chosen;
  }

  std::vector<S_Examined> examined;

  for (int c = 0 ; c < count ; c++)
  {
    // Compute the gain obtained by encoding this candidate in the first spot of the dictionary.
    S_candidate& cand = *candidates[i];
    int gain = ComputeGain(dict, cand);
    examined.push_back( { c, gain } );  // Add to vector for sorting.
  }

  // We sort by gain, this allows us to make the greedy choice of "best immediate candidate"
  // when we are asked to examine the minimum number of choices (width = 1).

  std::sort(examined, CompareElements); // Sort by gain, highest last.

  // Below is the loop where we examine "width" possible branches starting from the
  // current best gain candidate. If width is 1, then we only examine the first best
  // candidate, if width is 2, then we examine the two best candidates,
  // if width == count, then we examine all possible candidates.

  int index = count - 1;
  S_Examined current = examined.last();
  examined.pop_back();  // Remove from the array (since we are choosing it for the first iteration).

  do
  {
    // Obtain the rest of the entries to put in the dictionary (recursive call).
    std::vector<S_Element>&
    S_Chosen rest = ComputeGains( dict + 1, &examined[0], count - 1, width - 1 );

    // If this returned list has better or equal total gain than the current one, we choose it.
    if (current.gain + rest.totalgain >= bestTotalGain)
    {
      chosen.totalgain = current.gain + rest.totalgain;
      chosen.elements.swap(rest);
      chosen.elements.push_front(current);
    }

    // Get the "next current" and place the "current current" back into the vector in its place.
    S_Examined next = examined[--index];
    examined[index] = current;
    current = next;

  } while (width-- >= 0); // We do at least one run of that loop.

  return chosen;
}

int ComputeGain( int* dictionary, S_Element& element)
{
  // Note: this could be negative
  int gain = element.freq * (element.size - dictionary[0]);

  return gain;
}

Symoon · Post by **Symoon** » Wed Apr 03, 2019 10:04 am

Ouch, thanks

I promise I'll test it once I finish (or drop!) v1.3, which is half-coded and will require heavy testing, as I'm changing many more things than expected.

Symoon · Post by **Symoon** » Fri Apr 05, 2019 11:53 am

Okay, so the new version is promising but not working yet, and not sure it will.
Loading time saved on Zorgon would be more than 1 second

So far, "normal" (uncompressed) bytes seem to load correctly, but I have bugs and not been able to check yet if it's a global timing problem, or specific to compressed bytes - RLE or dictionary.
Oh, and it seems it's not working with Euphoric anymore - no idea why but I don't care yet as it's not the main target.

Symoon · Post by **Symoon** » Wed Apr 10, 2019 7:38 pm

Giving up Novalight 1.3.
Its main goal was to try reducing the Oric working time to save a decoded byte in memory by 1 sample (22.67µs). This also opened a door to extend the dictionary.

But timings are too short for real machines. It worked at 95% on emulators, but definitely doesn't on real Orics.

That being said, in the process I found a few more small optimisations that might bring new ideas. Maybe.

Symoon · Post by **Symoon** » Mon Apr 22, 2019 6:36 am

Couldn't help trying again

And got a prototpye of Novalight 1.3 to work on Oricutron and Euphoric... But not on a real Oric.

The decoding is probably too demanding, there's actually only a 3µs margin left on emulators, and I suppose real hardware doesn't cope with it. Hey, that's a difference between real machines and emulators

That's too bad, it really was opening doors to new things, but I suppose it has to stop at some point! Also, this very last version put the whole interrupt in page 2 (to save a JMP, 3µs), which was already risky as version 1.1 showed.

NekoNoNiaow · Post by **NekoNoNiaow** » Mon Apr 22, 2019 9:36 pm

Symoon wrote: ↑Mon Apr 22, 2019 6:36 am The decoding is probably too demanding, there's actually only a 3µs margin left on emulators, and I suppose real hardware doesn't cope with it. Hey, that's a difference between real machines and emulators

Is this the case for all emulators? Out of curiosity which ones did you try?
Also, I guess you could submit the issue to emulator writers since I suppose they would be happy to know when the emulators are not accurate.

Symoon wrote: ↑Mon Apr 22, 2019 6:36 am That's too bad, it really was opening doors to new things, but I suppose it has to stop at some point! Also, this very last version put the whole interrupt in page 2 (to save a JMP, 3µs), which was already risky as version 1.1 showed.

Maybe someone will find some optimizations eventually.

Have you implemented the combinatorial exploration algorithm I posted some time ago? It should definitely help create smaller files.

Symoon · Post by **Symoon** » Mon Apr 22, 2019 10:08 pm

NekoNoNiaow wrote: ↑Mon Apr 22, 2019 9:36 pm
Symoon wrote: ↑Mon Apr 22, 2019 6:36 am The decoding is probably too demanding, there's actually only a 3µs margin left on emulators, and I suppose real hardware doesn't cope with it. Hey, that's a difference between real machines and emulators
Is this the case for all emulators? Out of curiosity which ones did you try?
Also, I guess you could submit the issue to emulator writers since I suppose they would be happy to know when the emulators are not accurate.

I tried with both Euphoric, and Oricutron (1.2). Both load correct bytes, while real Orics don't seem to.
What I fail to understand, now that I've spent 1 more hour on it, is that I was wrong: it's not a 3µs margin but rather somthing like 12µs (based on Oricutron), which should be far enough to work.
How I wish I could see the Oric memory content!

NekoNoNiaow wrote: ↑Mon Apr 22, 2019 9:36 pm Have you implemented the combinatorial exploration algorithm I posted some time ago? It should definitely help create smaller files.

Not yet, sorry. I was focused on this new version which could have saved a full second, annd took me about 15 different test versions - too messy to start another change

But I will eventually! What I'd need first would be to understand why it doesn't work on real Orics to definitely get this 1.3 version out of my mind

Symoon · Post by **Symoon** » Tue Apr 23, 2019 9:12 am

Ok, by loading bytes on screen, I've been able to understand the beginning of something: it's actually not the timing problem I was thinking of, but a silence between two sub-parts that causes a wrong reading of the following byte, on real machines only. The same silence that, IIRC, I had tried to remove but put back a while ago as it caused problems without it on some programs.

I have no clear explanation yet but it's obvious there's something there.
So I'm back with multiple testings... When I have time to

Symoon · Post by **Symoon** » Wed Apr 24, 2019 9:45 am

Yes!
Calculations showed I was apparently less than 1µs short on real machines (playing the WAV file at 43500Hz instead of 44100 worked

), and I found a 2µs optimisation in the 'guilty' part of code.

So now, I have to finish the page 2 swapping to place the interrupt there and restore page 2 once loaded, and I hope Novalight 1.3 will work.
Only drawback: it's apparently not working on Euphoric anymore.

On optimising: I might end up mystic or crazy, beliving that all that is required to find optimisations is to stare at the code with a threatening look, for one hour, then the code gets tired of it and surrenders, and you find a way to code it better

Symoon · Post by **Symoon** » Fri Apr 26, 2019 9:37 pm

Found a way to, I hope, avoid using page 2.
So far, Novalight 1.3 works with Oricutron, and some programs work on real Atmos, others load but don't work. Got to find why.

The figures would be:

Code: Select all

             Novalight 1.1       1.2       1.3
-------------------------------------------------
Zorgon:           14.8s          14.5s     13.4s
Hires ship:       3.1s	         3.1s      3s

(that's including short silences at the beginning and end, so it's actually 0.4s faster. And still without the new possible dictionary optimisation

)

NekoNoNiaow · Post by **NekoNoNiaow** » Sat Apr 27, 2019 3:19 am

Symoon wrote: ↑Wed Apr 24, 2019 9:45 am Yes!

\(^^)/

Symoon wrote: ↑Wed Apr 24, 2019 9:45 am Only drawback: it's apparently not working on Euphoric anymore.

Then tell F. Frances to fix Euphoric! If it works on real hardware and not emulators, the fault lies with the emulator, not your code.

And in any case, you should also tell the Oricutron guys to fix it as well if it does not behave like the actual hardware.
If the emulator was correct, you would have detected the issue while using it and it would have been much easier to debug.

Oh, and by the way, have you published the sources to 1.2?

Symoon · Post by **Symoon** » Sat Apr 27, 2019 8:37 am

I'm afraid I've been optimistic yesterday. I still don't understand why, but I get few random loading errors. I know something must be too slow somewhere (slowing down from 44100Hz to 43500 loads perfectly), but errors occur on various bytes types (compressed or not), strange. And a bit depressing!

NekoNoNiaow wrote: ↑Sat Apr 27, 2019 3:19 am
Symoon wrote: ↑Wed Apr 24, 2019 9:45 am Only drawback: it's apparently not working on Euphoric anymore.
Then tell F. Frances to fix Euphoric! If it works on real hardware and not emulators, the fault lies with the emulator, not your code.

And in any case, you should also tell the Oricutron guys to fix it as well if it does not behave like the actual hardware.
If the emulator was correct, you would have detected the issue while using it and it would have been much easier to debug.

I do agree, but I have to check my code first, it's more likely that the problem is on this side

Anyway, as I can't check exactly what happens on the real machine, it's hard to tell well the problem is!
Also, Euphoric might not be working because I'm using an old computer with XP (not plain DOS), and the difference may be due to slowdowns... As we're talking about 1 or 2µs differences, it might be the problem.

NekoNoNiaow wrote: ↑Sat Apr 27, 2019 3:19 am Oh, and by the way, have you published the sources to 1.2?

Of course, I don't release the thing without sources

Good luck reading them though, sorry for that

Symoon · Post by **Symoon** » Sat Apr 27, 2019 10:35 pm

I am still trying to understand the problem with Novalight 1.3 on real machines. Doing this, I had loaded Oricium, without success: I left it as it was, i.e. stuck near the end or the loading. So while the PC audio was still plugged on the Oric, I tried to reproduce the problem with Oricutron, and while typing CLOAD on the PC... Oricium on the real machine started!
Each time I pressed a key on Oricutron, I could see the scrolling of Oricium shift by one char.

So, so far, I turned Novalight 1.3 into an Oricium scrolling controller from PC! Not really what I'm expecting, I must say

Chema · Post by **Chema** » Sun Apr 28, 2019 7:42 am

In the last version I included a Press Key To Start just to avoid that, not sure if you tried and old version or pressed a key.

The thing is that the VSync hack detection routine detects noise coming from the tape in, it wrongly assumes the hack is present and identifies every pulse as the VSync signal to which all the drawing routines are tied

forum.defence-force.org

Novalight - very fast tape loading

Re: Novalight - very fast tape loading

Re: Novalight - very fast tape loading

Re: Novalight - very fast tape loading

Re: Novalight - very fast tape loading

Re: Novalight - very fast tape loading

Re: Novalight - very fast tape loading

Re: Novalight - very fast tape loading

Re: Novalight - very fast tape loading

Re: Novalight - very fast tape loading

Re: Novalight - very fast tape loading

Re: Novalight - very fast tape loading

Re: Novalight - very fast tape loading

Re: Novalight - very fast tape loading

Re: Novalight - very fast tape loading

Re: Novalight - very fast tape loading