• Portada
    • Recientes
    • Usuarios
    • Registrarse
    • Conectarse

    Hardlimit test bank

    Programado Fijo Cerrado Movido Software
    384 Mensajes 19 Posters 173.6k Visitas 4 Watching
    Cargando más mensajes
    • Más antiguo a más nuevo
    • Más nuevo a más antiguo
    • Mayor número de Votos
    Responder
    • Responder como tema
    Accede para responder
    Este tema ha sido borrado. Solo los usuarios que tengan privilegios de administración de temas pueden verlo.
    • whoololonW Desconectado
      whoololon Veteranos HL
      Última edición por

      Ah, so.

      Thanks for the clarification.

      For the next version, in the final message when collecting data, fix "Retriving data".

      ?

      ...me lo dicen las voces...

      hlbm signature

      1 Respuesta Última respuesta Responder Citar 1
      • cobitoC Desconectado
        cobito Administrador
        Última edición por

        I have a question about some results. Let's see if anyone knows why this could happen.

        The other day, @krampak uploaded some results of the Ryzen 7 2700x. Specifically, I was intrigued by this one executed with 32 threads. This model has 8 cores with SMT. Looking at the results of a validation at 16 threads uploaded by the same @krampak on the same day, presumably from the same machine, the results look comparatively curious.

        Leaving aside the memory tests, in the single-thread tests, the validation at 32 threads gets 1.2% more than the one at 16 threads. With that, we know that the conditions have been the same (same background loads) and we also know that both tests were run at stock frequency.

        In contrast, in the multi-thread tests, the validation at 32 threads gets 5.3% more performance than the one at 16 threads. The way the benchmark works in multi-thread is very simple: it launches a process per thread, does its operations, calculates its scores and finally adds up the scores of all the processes. That is, if it got more points, it's because it was able to do more operations in the same time.

        I can only think of two possibilities:

        1. That when executing 32 threads, the benchmark is monopolizing a greater percentage of the CPU, taking processing time away from background programs.
        2. That it is making better use of the processor's segmentation.

        If I remember correctly, this behavior was also noticed by @kynes a while ago. A third possibility is a bug in the program, but after reviewing the code, I can't figure out how it could be happening because, in addition, the result of the integer test is practically the same in both cases; with a bug in the program, the behavior would have had to be reproduced in all the tests.

        Anyway, I leave it there in case you feel like pondering for a while.

        Toda la actualidad en la portada de Hardlimit
        Mis cacharros

        hlbm signature

        kynesK 1 Respuesta Última respuesta Responder Citar 3
        • whoololonW Desconectado
          whoololon Veteranos HL
          Última edición por

          What I'm saying is that, with all the fuss about UserBenchmark, you need a serious site to compare processors... that's it. ?

          ...me lo dicen las voces...

          hlbm signature

          cobitoC 1 Respuesta Última respuesta Responder Citar 3
          • cobitoC Desconectado
            cobito Administrador @whoololon
            Última edición por cobito

            As I mentioned in the other thread, the program is signed with a CA approved by Microsoft. So from now on, this becomes something serious.

            The Windows Smartscreen still appears. The theory says that the program (but especially the certificate), have to gain reputation. There are three ways to gain reputation:
            · Leave the executable in a public place. Over time, it will gain points. This is done.
            · Download it from different sites. The more it is downloaded and executed, the more points it will gain. I read that if it is done from Internet Explorer or Edge it will be better. But in reality I think it makes little difference. The point is simply to download and run it (no need to pass it or validate). This is where you can lend a hand.
            · Leaving the executable in any folder (like the downloads folder) so that the Windows Telemetry program can see it.

            In this way, in a matter of days, the warning message will disappear.

            @whoololon said in Hardlimit test bench:

            What I'm saying is that, with the fuss that's been made about the UserBenchmark, a serious site is needed to compare processors... that's all I'm saying. ?

            Well yes, it may be a good time to move the matter. I have some ideas...
            By the way, the typo in the text that you commented on has already been corrected.

            Regarding the central, there have been a couple of minor updates. The most important ones (the details in the first post of the thread):
            · Now shows OC ranking in each fiche.
            · Now shows the version in Spanish if a browser in Catalan, Galician, Basque, Asturian or Occitan is detected.

            Toda la actualidad en la portada de Hardlimit
            Mis cacharros

            hlbm signature

            1 Respuesta Última respuesta Responder Citar 3
            • whoololonW Desconectado
              whoololon Veteranos HL
              Última edición por

              I've been taking a look, and I'm not clear if the results shown, both in the micro description and in the ranking table, are the best scores (micro with OC up to the hilt in a specific configuration for tests like Xevipiu), or the average of all validated results for the micro, or only those that go with the serial frequency...

              Thanks in advance.

              ...me lo dicen las voces...

              hlbm signature

              cobitoC 1 Respuesta Última respuesta Responder Citar 0
              • cobitoC Desconectado
                cobito Administrador @whoololon
                Última edición por cobito

                @whoololon Both in the micro description (cpu.php) and in the different processor and architecture rankings, only results without OC (stock frequency) are taken into account to make the average. The overclocked results appear in a separate table within the tab of each model (if there are overclocked results).

                In the results of a validation (result.php), the data corresponds to the validation in question, without taking into account other validations. The user ranking table that appears both on the home page and in the validation result takes into account individual validations, including overclocked processors and without making averages.

                In summary: the tabs and rankings calculate an average of the validations at stock frequency. If there are overclocked results for a model, they are shown in the tab separately. The user rankings show individual results (without average) including overclocked validations.

                I don't know if that was what you were asking.

                Toda la actualidad en la portada de Hardlimit
                Mis cacharros

                hlbm signature

                1 Respuesta Última respuesta Responder Citar 2
                • whoololonW Desconectado
                  whoololon Veteranos HL
                  Última edición por

                  Yes, that was it; thanks for the clarification. ?

                  ...me lo dicen las voces...

                  hlbm signature

                  1 Respuesta Última respuesta Responder Citar 1
                  • kynesK Desconectado
                    kynes Veteranos HL @cobito
                    Última edición por kynes

                    @cobito There is something that has me puzzled. The microphone of my laptop has 4 cores and 4 threads, but it gains about 25,000 points in multithreading if I set 8 threads instead of 4. I understand that it must be that it thus monopolizes more processor time, but then the results would not be totally consistent if the maximum number of threads possible are not used. Would there be any way to test more than 8 threads, to see what result it gives? If it is marginally superior or similar, it is a matter of micro usage. If it is very superior, it must be a bug in the benchmark.

                    Hardlimit.png

                    hlbm signature

                    cobitoC 2 Respuestas Última respuesta Responder Citar 2
                    • cobitoC Desconectado
                      cobito Administrador @kynes
                      Última edición por cobito

                      @kynes First of all, there is a discrepancy between the program results and the central that is not corrected yet because I am thinking about how to give the most reliable result possible: the program uses the old method which consists of counting the maximum result of each test. In the central, an average of the 10 samples per test is made. In this way, from the program, greater variations between executions are appreciated while in the central, those differences (that can be caused by background processes) are filtered and are less appreciated.

                      Having said that, of the 4 results you have sent (2 with 4 threads and 2 with 8 threads), I have chosen the extremes to have the worst possible case: the one that gave the lowest score in 4 threads and the one that gave the highest score in 8 threads. The difference in the total multithread score is 4.7%. For reference, the differences between the two results at 4 threads and at 8 threads are 1.3% and 1.2% respectively.

                      To me, personally, that there is a 4.7% difference between the extremes vs that there is a 1.2% difference in validations with the same number of threads, it seems normal seeing how Windows 10 has a hundred things in the background.

                      But something that could be a failure of the program (and that would also be quite difficult to diagnose as to correct if it really were a failure), is the fact that in the tests with 8 threads, there is a peak of scores in the first sample. Surely because of that you have measured such a large difference in the program results where those peaks were taken at the same time as in the results of the central, that difference is much smaller, because the average was calculated.

                      I will see if I have a moment and prepare the version without thread limit that you mention so that you can test that.

                      Toda la actualidad en la portada de Hardlimit
                      Mis cacharros

                      hlbm signature

                      1 Respuesta Última respuesta Responder Citar 0
                      • cobitoC Desconectado
                        cobito Administrador @kynes
                        Última edición por cobito

                        @kynes Here is a modified version without a thread limit. In general, it seems that the multi-threaded result is proportional to the number of cores regardless of the excess threads, although there is a slight improvement when the number of threads is higher than the processor. But there is a machine where the thread synchronization has failed and has not been detected, generating a meaningless result. The PC has 4 cores with HT and from 32 threads it seems to fail.

                        This version only works in FPU and AVX 2 mode and the results are not valid.

                        Toda la actualidad en la portada de Hardlimit
                        Mis cacharros

                        hlbm signature

                        kynesK 1 Respuesta Última respuesta Responder Citar 1
                        • kynesK Desconectado
                          kynes Veteranos HL @cobito
                          Última edición por kynes

                          hardlimit-2.png

                          I understand that if you took the average of the threads, the result would be coherent, but there is one that is going off the rails. I'm going to try with 128 to see what happens.

                          hlbm signature

                          cobitoC 1 Respuesta Última respuesta Responder Citar 1
                          • kynesK Desconectado
                            kynes Veteranos HL
                            Última edición por kynes

                            With 128 threads I think I hold the world record in multithreading:

                            hardlimit-3.png

                            Well, and if I don't have it, I'll try it with 256 threads to see what happens ?

                            hlbm signature

                            cobitoC 1 Respuesta Última respuesta Responder Citar 2
                            • cobitoC Desconectado
                              cobito Administrador @kynes
                              Última edición por cobito

                              @kynes There is a clear difference here. I am also seeing it in my case. I suppose that in the end I will have to apply a kind of truncated mean: something like eliminating the two highest values, the two lowest and making an average of the remaining 6. Because it is clear that the outliers at the beginning distort the measure.

                              I am also going to review the synchronization mechanism, to see if the fault was there.

                              By the way, be careful with the 256 threads, because if the system crashes and the processes lose communication with each other, they can remain permanently waiting consuming 100% of all the cores and you will need to either close each process manually or restart the PC.

                              Toda la actualidad en la portada de Hardlimit
                              Mis cacharros

                              hlbm signature

                              1 Respuesta Última respuesta Responder Citar 1
                              • cobitoC Desconectado
                                cobito Administrador @kynes
                                Última edición por cobito

                                @kynes said in Hardlimit Test Bench:

                                With 128 threads I think I have the world record in multithreading:

                                Well, and if I don't have it, I'll test it on 256 threads to see what happens ?

                                There the synchronization has failed. Basically you are passing a handful of threads at different times and the scores are being added as if they had all been passed at once.

                                Toda la actualidad en la portada de Hardlimit
                                Mis cacharros

                                hlbm signature

                                1 Respuesta Última respuesta Responder Citar 1
                                • cobitoC Desconectado
                                  cobito Administrador
                                  Última edición por

                                  Some conclusions I draw from this:

                                  1. When the test bench runs with an amount 4 times higher than the number of processor threads, the program fails and gives absurd results. This does not worry me because it will be limited to double the threads.

                                  2. When an amount higher than the number of processor threads is run (but below an absurd amount), false positives usually occur in the first sample. Here are results with double the threads of three different models:

                                  Core i7-6820HQ
                                  164e1e16-b9bf-45b8-8e7f-fae74fc5dc9e-imagen.png

                                  Pentium N3540
                                  961505a0-7879-41fa-9790-550d4d25386a-imagen.png

                                  Core i5-7300HQ
                                  949ab679-b4d4-4cf5-8647-e2aa46f8f547-imagen.png

                                  That this happens in three different models makes it clear that it is a generalized behavior. Curiously, these peaks are seen in tests #1, 2, and 4, but not in test 3. That this behavior is not reproduced in test 3 makes me think it's complicated to think of a program failure, but it cannot be ruled out yet.

                                  Below are the same models with a number of threads equal to the CPU:

                                  1. Core i7-6820HQ
                                    651ed880-8bb5-429f-b6e9-0e8644b2855e-imagen.png

                                  2. Pentium N3540
                                    18f486bd-e741-4130-9317-2d92842f2f53-imagen.png

                                  3. Core i5-7300HQ
                                    e186ba3e-a25e-4bb9-88b2-03ad9e5885b1-imagen.png

                                  If there are peaks, they are practically imperceptible.

                                  1. In the i5 and i7, a sustained improvement is seen throughout the test. Here there are only two options: that a larger amount of CPU is being monopolized or that segmentation is being better utilized. In the case of the Core i7-6820HQ, it is a PC with many programs running in the background along with an antivirus. The Pentium N3540 runs Windows 10 without anything else, with nothing running in the background and no antivirus. Perhaps for this reason, doubling the number of threads does not improve performance in a sustained way.

                                  In general

                                  What distorts the result shown in the program with double the threads is the initial peak. The problem is that I don't know why it occurs. If it were the program, I would expect a peak in all tests or a peak at the beginning and another at the end, but it doesn't happen that way. Could it be a trick of the cache? Honestly, I have no idea. But it is strange and the program will need to be reviewed.

                                  Toda la actualidad en la portada de Hardlimit
                                  Mis cacharros

                                  hlbm signature

                                  1 Respuesta Última respuesta Responder Citar 3
                                  • krampakK Desconectado
                                    krampak Global Moderator
                                    Última edición por

                                    And why don't we remove the option to specify the number of threads and force it to always run with the number of cores available on the CPU? I say this to avoid unnecessary differences between users.

                                    Mi Configuración
                                    hlbm signature

                                    cobitoC 1 Respuesta Última respuesta Responder Citar 2
                                    • cobitoC Desconectado
                                      cobito Administrador @krampak
                                      Última edición por cobito

                                      @krampak Initially, the reason for having the freedom to specify the number of threads was in case HT/SMT detection failed. So far, that hasn't happened once. So there's no reason to keep it.

                                      From the point of view of measuring the performance of a processor, choosing the number of threads is useful for seeing how a micro performs at half load, which would be quite interesting for evaluating processors at half load, which is actually how they are used most of the time. But if it's already difficult to receive validations with the default configuration, it's much more so with slightly exotic configurations.

                                      Here a solution, as you say, is either to remove the possibility of choosing the number of threads or perhaps to limit the maximum to the number of threads of the processor to leave that possibility open.

                                      Another possibility (which is complementary) is to do a truncated mean, something that will eventually be applied because this would correct all the results that have been sent so far.

                                      And another possibility is to directly ignore the first sample; a simple but not elegant solution.

                                      The method on how to calculate the final score is something that I have been thinking about for a long time (hence the program and the central use different criteria). The truncated mean is the one that is winning because it avoids this type of problems of unknown origin and because it filters the result in PCs with moderate background load.

                                      The test bench has several mechanisms to prevent tampering with results and at the beginning of the development, it was the part to which more hours were dedicated. Fortunately, this failure has a retroactive solution. One thing is clear is that one of the objectives is to measure the real performance of the machine, without a specific configuration of the program offering a performance superiority that does not exist.

                                      The latter I say to make it clear that this is not a trivial failure and that it will be solved. Until then, I remain open to suggestions.

                                      Toda la actualidad en la portada de Hardlimit
                                      Mis cacharros

                                      hlbm signature

                                      1 Respuesta Última respuesta Responder Citar 3
                                      • whoololonW Desconectado
                                        whoololon Veteranos HL
                                        Última edición por whoololon

                                        Let me see if I understand it correctly: the program executed by default shows reliable results, but when adjusting the parameter of how many threads we want it to use, it is prone to showing "unusual" results...

                                        If that is the case, personally I would only let the results obtained by default be validated, at least until the incident is resolved... we just needed an army of pollagorders cheating at solitaire and, by the way, falsifying the ranking (this is the most serious thing for me, after all, I think what is intended is for the table to be reliable).

                                        The option to choose the number of threads can be maintained, warning that it may give erroneous results and that they are not "official", for those who like tinkering.

                                        ...me lo dicen las voces...

                                        hlbm signature

                                        1 Respuesta Última respuesta Responder Citar 2
                                        • cobitoC Desconectado
                                          cobito Administrador
                                          Última edición por cobito

                                          Surely, by tomorrow or the day after (from there, depending on how long it takes Microsoft to certify it), version 1.4 will be ready, which will come with the issue of falsified scores fixed, among other changes.

                                          If any of you have a current and powerful processor, I could use a screenshot of the CPU tab to put on the Store page, since I only have a few older PCs around here.

                                          Toda la actualidad en la portada de Hardlimit
                                          Mis cacharros

                                          hlbm signature

                                          1 Respuesta Última respuesta Responder Citar 2
                                          • whoololonW Desconectado
                                            whoololon Veteranos HL
                                            Última edición por

                                            This is the most powerful thing I have in my house, I don't know if it's what you're looking for.
                                            alt text

                                            ...me lo dicen las voces...

                                            hlbm signature

                                            cobitoC 1 Respuesta Última respuesta Responder Citar 2
                                            • 1
                                            • 2
                                            • 16
                                            • 17
                                            • 18
                                            • 19
                                            • 20
                                            • 19 / 20
                                            • First post
                                              Last post

                                            Foreros conectados [Conectados hoy]

                                            0 usuarios activos (0 miembros y 0 invitados).
                                            febesin, pAtO,

                                            Estadísticas de Hardlimit

                                            Los hardlimitianos han creado un total de 543.5k posts en 62.9k hilos.
                                            Somos un total de 34.9k miembros registrados.
                                            roymendez ha sido nuestro último fichaje.
                                            El récord de usuarios en linea fue de 123 y se produjo el Thu Jan 15 2026.