Line routine - Second attempt
I worked on optimizing a little bit this morning.
It turns out, that the memory layout is a big performance killer. Not the 6 bits/byte, but the 40 bytes/row.
If the screen buffer would have been organized in columns instead of rows, we wouldn't have to update 16-bit pointers which each step in y-direction. And in x-direction we only would have to update them every 6th pixel.
I wonder if there is an efficient way to handle this problem.
It turns out, that the memory layout is a big performance killer. Not the 6 bits/byte, but the 40 bytes/row.
If the screen buffer would have been organized in columns instead of rows, we wouldn't have to update 16-bit pointers which each step in y-direction. And in x-direction we only would have to update them every 6th pixel.
I wonder if there is an efficient way to handle this problem.
Have fun!
thrust26
thrust26
You have been programming the 6502 for how long and you are just now figuring out 16 bit pointers are a problem?
Seriously, there is a reason I try to stick to the 6803/9 CPUs. I ported a simple music player to the 6502 (it's somewhere on this forum) just to re-familiarize myself with the CPU, and the code size was at least 3 times that of the 6803 version (not that I claim my 6502 code is fully optimized).
I don't know that you'll get more efficient than a page 0 (16bit) pointer for Y and then index off of it for X.
Adding/Subtracting 40 to/from the Y pointer shouldn't cause all 16 bits to be updated more than every 4 lines or so, but it does look ugly in the code.
FWIW, this is one of the strongest cases for adding a special case for horizontal lines (X1 = X2). You can remove the 16 bit Y code from inside the loop.
Seriously, there is a reason I try to stick to the 6803/9 CPUs. I ported a simple music player to the 6502 (it's somewhere on this forum) just to re-familiarize myself with the CPU, and the code size was at least 3 times that of the 6803 version (not that I claim my 6502 code is fully optimized).
I don't know that you'll get more efficient than a page 0 (16bit) pointer for Y and then index off of it for X.
Adding/Subtracting 40 to/from the Y pointer shouldn't cause all 16 bits to be updated more than every 4 lines or so, but it does look ugly in the code.
FWIW, this is one of the strongest cases for adding a special case for horizontal lines (X1 = X2). You can remove the 16 bit Y code from inside the loop.
Not at all. They are just time consuming. And with a clever memory setup you can reduce that time. Unfortunately the Oric setup is not that clever.JamesD wrote:You have been programming the 6502 for how long and you are just now figuring out 16 bit pointers are a problem?
Every 6 lines. But instead of simply indexing with Y, you have to add some extra logic. And that costs time.I don't know that you'll get more efficient than a page 0 (16bit) pointer for Y and then index off of it for X.
Adding/Subtracting 40 to/from the Y pointer shouldn't cause all 16 bits to be updated more than every 4 lines or so, but it does look ugly in the code.
That's all already in the code.FWIW, this is one of the strongest cases for adding a special case for horizontal lines (X1 = X2). You can remove the 16 bit Y code from inside the loop.
Have fun!
thrust26
thrust26
I was just giving you a bad time.thrust26 wrote:Not at all. They are just time consuming. And with a clever memory setup you can reduce that time. Unfortunately the Oric setup is not that clever.JamesD wrote:You have been programming the 6502 for how long and you are just now figuring out 16 bit pointers are a problem?
Just remember that what is clever for one application, might be a nightmare for another.
Hi.
Of course I am very interested in this routine and new optimizatios... I just updated from the svn server and cannot compile the new version. _TableMod6 is missing.
Anything wrong here?
Could you please post here when there is a more or less final version, so I can integrate in my programs to check the improvements in speed?
Of course I am very interested in this routine and new optimizatios... I just updated from the svn server and cannot compile the new version. _TableMod6 is missing.
Anything wrong here?
Could you please post here when there is a more or less final version, so I can integrate in my programs to check the improvements in speed?
My fault. Committed display.s now too.Chema wrote:Hi.
Of course I am very interested in this routine and new optimizatios... I just updated from the svn server and cannot compile the new version. _TableMod6 is missing.
Anything wrong here?
That should be during today, only minor improvements from then on.Could you please post here when there is a more or less final version, so I can integrate in my programs to check the improvements in speed?
Have fun!
thrust26
thrust26
Don't expect too much overall improvement, there are other things which cost a lot of time.Chema wrote:Thanks thanks thanks...
BTW: How did you do your tests without the double buffer? Did you calculate everything (all points, polygons etc.) before you finally erased and redraw line by line? So that the screen updates are as close together as possible?
If not, try that once please. If yes, try again with the much faster line draw code.
Have fun!
thrust26
thrust26
I knowthrust26 wrote: Don't expect too much overall improvement, there are other things which cost a lot of time.
I gave it a try and noticed a couple of things. It is quite faster, and that is noticeable in the game overall speed. It is much nicer, better lines and again nice ships on sight
But also I found it alters attributes when drawing totally horizontal lines. Beware I have changed the eor(tmp0) with ora(tmp0) so I might have altered something in the process. Or maybe something is not working with chunking?
And finally... where did my memory go? I have lost 2K from my free space!!! Surely it is not just for the new tables, so I suppose I will need to have a look at how things are aligned and the space I am losing there, because this gives me indeed trouble.
Ok, I will. This means rewritting many things in the program, so I will give it a try on the ship demo code. I in fact saved the origin destination of all lines drawn and cleared them just before drawing the new ones, thus using two lists of line endings. And the flickering was horrible. But the line routine was much slower, so I will try again.BTW: How did you do your tests without the double buffer? Did you calculate everything (all points, polygons etc.) before you finally erased and redraw line by line? So that the screen updates are as close together as possible?
If not, try that once please. If yes, try again with the much faster line draw code.
Another ugly side effect of drawing with eor is that when two lines converge, they erease each other, so some ship corners and details are lost.
And I will have to workout a similar process for circles, stars and every other in-screen info...
Regards and thanks indeed again for your help here...
Sounds good.Chema wrote:I gave it a try and noticed a couple of things. It is quite faster, and that is noticeable in the game overall speed. It is much nicer, better lines and again nice ships on sight
BTW: I checked out TINE and build it, but the result seems not to work. What am I missing?
I will have a look at the horizontal line code. Though I didn't touch it, other changes may have influenced it.But also I found it alters attributes when drawing totally horizontal lines. Beware I have changed the eor(tmp0) with ora(tmp0) so I might have altered something in the process. Or maybe something is not working with chunking?
Oh, that's just me.And finally... where did my memory go? I have lost 2K from my free space!!!
Understood. I am always optimizing to the limit when possible. But if you want me to save memory, I need to know what is the limit. So where would you rather save memory instead of speeding up the graphics.Surely it is not just for the new tables, so I suppose I will need to have a look at how things are aligned and the space I am losing there, because this gives me indeed trouble.
How much memory do you have to work with in total? And how much is currently left?
So did you first clear all lines and then redraw all? Or did you clear and redraw line by line?Ok, I will. This means rewriting many things in the program, so I will give it a try on the ship demo code. I in fact saved the origin destination of all lines drawn and cleared them just before drawing the new ones, thus using two lists of line endings. And the flickering was horrible. But the line routine was much slower, so I will try again.
I really would like to see your experiments with my own eyes (well, emulated), too.
Yes and no. Yes, the erase each other, but when to many lines come close together, you don't get an unstructured pixel blob. With double buffering I still would go for OR, but XOR is IMO not that bad. Especially if using it give a massive speed improvement.Another ugly side effect of drawing with eor is that when two lines converge, they erase each other, so some ship corners and details are lost.
It's all just a matter of compromises and which ones are easier for YOU to accept.
Sure, if you should decide for a change, you have to touch quite a lot of code againAnd I will have to workout a similar process for circles, stars and every other in-screen info...
I am having fun, so it is a pleasure.Regards and thanks indeed again for your help here...
Have fun!
thrust26
thrust26
Mmmm not sure. Did you get any error message? I am using a utility called taptap to correctly setup the filenames inside the disk. If you don't have it (which is possible) then you end up with a disk with nonamexxx.com instead of the correct name, so the game is not launched. You can launch it manually though.thrust26 wrote:BTW: I checked out TINE and build it, but the result seems not to work. What am I missing?
I am not sure where taptap is... I think in the repository.
That is quite strange... I will also have a look then. If you did not touch that code, then the bug could be there or maybe I introduced it without noticing...I will have a look at the horizontal line code. Though I didn't touch it, other changes may have influenced it.
EDIT Ok, it was an old bug, that has appeared again.
in draw_totaly_horizontal8 there is a code:
Code: Select all
ldx _OtherPixelX
sta __auto_cpx+1
Code: Select all
ldx _OtherPixelX
stx __auto_cpx+1
Well, I always prefer a bigger routine which is optimized to the maximum, and stick to it, but I had only 3.7K left in main memory, which I planned to use for missions, and now I have 1.6 K left, which might be a bit low.Understood. I am always optimizing to the limit when possible. But if you want me to save memory, I need to know what is the limit. So where would you rather save memory instead of speeding up the graphics.
How much memory do you have to work with in total? And how much is currently left?
I can use from $500 to $9fff in main memory, plus page 4 (already mostly used) and page 2 (I have plans for it). In overlay I am using nearly all of the 16K
Well, I trend to eat up all the memory, as you do, but in my case it is due my fat code... well structured, but fat
I can't remember really... I think I erased all, then draw all, as I only had the rotating ship in sight. Doing it ship-by-ship could be easy, doing it line-by-line... I am not sure.So did you first clear all lines and then redraw all? Or did you clear and redraw line by line?
I really would like to see your experiments with my own eyes (well, emulated), too.
You are also right here, but whenever I compare the ship pics between the 6502 eor versions and the speccy, I end up with the same conclusion: the speccy version looks nicer.Yes and no. Yes, the erase each other, but when to many lines come close together, you don't get an unstructured pixel blob. With double buffering I still would go for OR, but XOR is IMO not that bad. Especially if using it give a massive speed improvement.
It's all just a matter of compromises and which ones are easier for YOU to accept.
But I am biased, so I have to try. Gimme some time to sort it out and I will send you a demo with eor, right?
That is the spirit, but you are anyway being of great helpI am having fun, so it is a pleasure.
It compiles without error into a .tap file. But shouldn't this be a .dsk-file?Chema wrote:Mmmm not sure. Did you get any error message? I am using a utility called taptap to correctly setup the filenames inside the disk. If you don't have it (which is possible) then you end up with a disk with nonamexxx.com instead of the correct name, so the game is not launched. You can launch it manually though.
I am not sure where taptap is... I think in the repository.
Also I got the latest files (e.g. taptap) from dBug, so I should have everything I need.
Fixed.EDIT Ok, it was an old bug, that has appeared again.
I see. I think one page be regained by rearranging code alignments. But the code got quite a lot larger and I added 3 new 240 byte tables. Not sure why this sums up to 2.1k, I would expect maybe 1.5k.Well, I always prefer a bigger routine which is optimized to the maximum, and stick to it, but I had only 3.7K left in main memory, which I planned to use for missions, and now I have 1.6 K left, which might be a bit low.
Too many detail for my little hardware knowledge.I can use from $500 to $9fff in main memory, plus page 4 (already mostly used) and page 2 (I have plans for it). In overlay I am using nearly all of the 16K
That's ~56k, right? And you are down to just 2k now?
You can't have both. Since I am coding for the Atari 2600, my code cannot afford much structure. Especially subroutines have to be avoided there, since each level eats up 2 bytes of the available 128 byte RAM.Well, I trend to eat up all the memory, as you do, but in my case it is due my fat code... well structured, but fat
In theory ship-by-ship should cause a bit more flicker and a bit less tearing than line-by-line. I don't know your data structure, but I suppose line-by-line requires quite some reorganization. But maybe you can do this for you test code.I can't remember really... I think I erased all, then draw all, as I only had the rotating ship in sight. Doing it ship-by-ship could be easy, doing it line-by-line... I am not sure.
Close ups, definitely. But in most cases ships are quite far away, and then the differences disappear soon.You are also right here, but whenever I compare the ship pics between the 6502 eor versions and the speccy, I end up with the same conclusion: the speccy version looks nicer.
OR, XOR, all things you test with, please.But I am biased, so I have to try. Gimme some time to sort it out and I will send you a demo with eor, right?
Have fun!
thrust26
thrust26
I tried to build Space1999 this morning and I was missing taptap. I don't think it's in the OSDK.
The horizontal line code looks like it can be sped up. It appears to do a pixel at a time where end bytes should come from a table(s) and middle bytes should be doable 6 bits (a byte) at a shot like I mentioned before. I might tinker with it a bit.
The horizontal line code looks like it can be sped up. It appears to do a pixel at a time where end bytes should come from a table(s) and middle bytes should be doable 6 bits (a byte) at a shot like I mentioned before. I might tinker with it a bit.
Yes, that should definitely work.JamesD wrote:The horizontal line code looks like it can be sped up. It appears to do a pixel at a time where end bytes should come from a table(s) and middle bytes should be doable 6 bits (a byte) at a shot like I mentioned before. I might tinker with it a bit.
Didn't touch the code there yet, because completely horizontal (or vertical) lines should occur very rarely. I doubt the benchmark will even go down by 1, even after this optimization.
Have fun!
thrust26
thrust26