It doesn t matter if your encoder encodes video at 10FPS or 30FPS, with RTP timestamp you tell the receiver how long is the pause between the two frames. So you determine that on the fly for each frame. That way you can send 10 frames in one second (10fps), and in other second you can send 30 frames (30 fps). You only need to set the RTP timestamp correctly. And if I get your question, you are in doubt how to do this...
Let the starting time stamp be 0, you add the wall clock time in milliseconds multiplied by 100 to the last RTP timestamp, or you can use any time scale you want. To make the decoder decode 10fps video at 30fps, add 333000 to RTP timestamp for each packet... but lets look at your example:
Frame # RTP Time Time between frames [ms]
[ 1] 0 0
[ 2] 50000 50
[ 3] 90000 40
[ 4] 420000 33
So if you set RTP timestamp like this (Time in ms * 100000)
you will make the decoder load and decode Frame 1, and then load and decode Frame 2, but it will sleep for 50 ms (time difference between Frame 1 and Frame 2) before it draws the Frame 2, and so on...
And as you can see, the decoder uses RTP timestamps to know when to display each one, and it doesnt mind if the video was encoded at 30 or 10 fps.
Also, if the video is 30 fps, that doesnt mean that for each second there will be 30 RTP packets. Sometimes there can be more then 100, so you can not have a formula that ensures the correct RTP timestamp calculation.
I guess that this is what you need... hope I helped, dont -1 me if I didnt... =)