Hossein,
It is generally the analyst's responsibility to decide what is most important to achieve the best model. In the context of hyperelastic models, stability usually refers to the Drucker stability criterion which, simplistically, states that positive energy is required to strain (deform) a material. This is often checked by ensuring positive slopes in true stress - strain curves. From a modeling perspective, it is important to ensure that your hyperelastic model meets this criterion in different loading modes (for instance, equi-biaxial extension, planar tension, and uniaxial tension) in the range of strain that is being modeled. So, an instability at, say 5%, strain is generally unacceptable, but one at, say, 100% strain, may be acceptable if the component being modeled has strains less than 10%. When a material model is unstable and you deform a component through that instability, you will get non-physical behavior which can also cause nonconvergence (often due to excessive element distortion).
Regarding the fitting error, there are different ways of calculating that, and whichever you choose is not so important as long as you use the same metric for comparing different hyperelastic models. Bearing in mind that the stress-strain response of hyperelastic materials is loading mode dependent, you want to bias (minimize) fitting errors to the the most important loading mode(s) for your component. That is, if you have a component that is predominantly loaded in compression, minimize model fit errors in that loading mode.
As you'll see, the answer to both of your questions touches on the dominant loading mode(s) in your component. If you do not know the dominant loading mode, you can run a simple FE analysis with a Neo-Hookean material model to figure that out with a biaxiality analysis. The concept is explained here:
https://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.544.4157&rep=rep1&type=pdf
There are different biaxiality definitions you can use, and my personal favorite is:
Biaxiality = log(lamda_min)/log(lamda_max)
because it bounds all biaxiality values between -0.5 (uniaxial tension) and -2 (equibiaxial tension). Planar tension (pure shear) falls in between these two with a value of -1.
Regards,
Travis