EPG translation script - an open project - help required.

There are 33 replies in this Thread which was already clicked 6,460 times. The last Post () by Chris230291.

    • Official Post

    With thousands of channels out there, many of which have dual audio, a way to translate epg data has been on my wish list for a very long time.

    Finally i have found a way to do so using argos-translate, see https://github.com/argosopente…nslate?tab=readme-ov-file

    Argos translate is good in that it works offline, from a downloaded "dictionary" and more importantly is free with no API call limits.

    All of the other translators i am aware of cost a fortune when your trying to translate large files, for context.... just sly Germany egp has over 300 000 lines of data, which is billions of character's.


    Below is the script i am currently using. Obviously you will need to install all the required module's and argos-translate, see it's readme file for detail.

    I have tried several ways of speeding it up while still maintaining the correct XML format, all my attempts to speed it up any further, have failed.

    I am not sure if argos-translate supports multi threading, but if it dose i cant get it to work correctly. Half the issue is that it doesn't really support XML structure either.

    From what i have read a decent GPU may be the fastest way to do things, again im limited to my lappie with its built in AMD chipset graphics.


    So here is the issue ......it takes over 4 hours to translate the 5 files in the script - and that is only for one epg provider!! :exploding head:

    I could refine the data to only what i require, mostly sport, movies and documentary channels, but ultimately i would still want all the data, for the all the sats.... and that would still take forever.


    So i am after help in refining the script to make it faster - while still maintaining good XML structure....

    And / Or .... other users to also translate EPG data, then if we share the translations - we could all get more EPG sources translated to English.


    Anyways I'm probably waffling now, script below:





    As i said earlier, it takes ages to translate the data, here is my run details from earlier today:





    Here is the source as i have added it to /etc/epgimport

    For now i am just storing the files locally.



    Ultimately you end up with this:

    slydeEPG.jpg    slyde222.jpg



    The translated data is not perfect, but it is mostly fine..... besides, my broken English is way better than my broken German.

    Anyways i hope there is some interest in this, it would be a load easier for everyone if a few peeps get on board and help.

    Let me know your thoughts...... any help with it would be appreciated :beer1:

  • This looks great project, for a long time normal users keeping ask for bulk translation for epg to English to get proper understanding for current and incoming event.


    This will improve a lot of functions like posters download, rating event search etc.


    Unfortunately I am not programmer, but I wonder if we limit the scope of translation to services located in bouquets instead of translation of entire epg file, it will be much faster.


    Also limit the translation to spacific number of days instead of make it open, I think limit it to 48 hrs will make sense.


    Appreciate your time to start this project and hope more contributions from expert here.


    Regards

  • jenseneverest

    Look in to concurrent futures for multithreading.

    Here's an example:

    Code
    import concurrent.futures
    
    strings = ["dog", "cat", "fish", "mouse", "elephant"]
    
    def printyMcPrinter(string):
        print(string)       
        
    with concurrent.futures.ThreadPoolExecutor() as executor:
        executor.map(printyMcPrinter, strings)

    Or


  • Here is a basic working example, using google translate (just easier for me).

    Proxies are required to bypass rate limits. I find TOR works.



    This input:


    Gives:



    You could use string.capwords() or similar to fix captialisation, if that bothers you.

  • Here my 2 pence worth via chatgpt. Some ideas for you.

    This uses deep_translator
    Concurrent futures
    Asyncio
    Autodetect language
    Only translates today. Skips others Enter the number of days you want. Skip others.
    Outputs to console every 10 seconds to show how many its translated.

    Don't know how long it takes. Its been running now 20 mins and translated 5500 entries

    Note: like Jensens. This is windows python 3 code. Not enigma2 python code.


    ** A person who feels appreciated will always do more than what is expected **

  • OK on my main computer rig which is quite high end.
    13th Gen Intel(R) Core(TM) i5-13400F, 2500 Mhz, 10 Core(s), 16 Logical Processor(s)
    and 32gb of RAM

    Obviously I cant compare that with Jensens Laptop....

    It took 36 mins to translate those 4 files. For just todays EPG entries.

    ** A person who feels appreciated will always do more than what is expected **

  • Code
    translated_text = GoogleTranslator(source='de', target='en').translate(text)

    That's hard set. Use “auto” instead of “de”.

    Also, the translator fails silently, so are you sure it really did translate successfully?

    Seems like chatgpt really overcomplicated it… or maybe I'm just dumb. :dizzy face:

  • It was auto... Chatgpt obviously changed again in later amends. It tries to be clever, even though you keep saying use autodetect. :)

    And did it translate. Yep. I have 4 output files

    pasted-from-clipboard.png

    4 new compressed files

    pasted-from-clipboard.png

    And example xml for today. It even changed the lang="en" in the output. Cool.

    pasted-from-clipboard.png

    ** A person who feels appreciated will always do more than what is expected **

    • Official Post

    I asked chatgpt if he could increase the download time speed and he replied


    and..


    Code
    with ThreadPoolExecutor(max_workers=10) as executor:
        translated_descriptions = await asyncio.get_running_loop().run_in_executor(executor, translate_texts, descriptions_to_translate)
    • Official Post

    cache ?



  • It was auto... Chatgpt obviously changed again in later amends. It tries to be clever, even though you keep saying use autodetect. :)

    Did every single programme that was meant to be translated actually get translated?

    Scroll to the bottom and double check.

    The reason I mention this is that I hit rate limits very quickly when I wasn't going via the tor network.


    I added to my example:

    - Caching (not convinced it made a difference)

    - Days to translate

    - Workers variable

    - Completion time

    - Simplified it further.


    I realise it doesn't handle the downloading, extracting, etc, but that's easily added.


    The pre downloaded basic sky de epg took "0h 4m 24s" translating all field for 1 day with 60 workers (I have a feeling this is a little high).


  • ok so lets have look

    _common was the last to finish

    pasted-from-clipboard.png

    Search for todays date from the end of the file - last entry for today that needed translating


    Code
     <programme channel="DeluxeRap.de" start="20241029180000 +0000" stop="20241030000000 +0000">
      <title lang="de">
       Total Rap
      </title>
      <desc lang="en">
       Non-stop rap. Your after-work beats.
      </desc>
     </programme>


    before translation

    Code
      <programme start="20241029180000 +0000" stop="20241030000000 +0000" channel="DeluxeRap.de">
        <title lang="de">Total Rap</title>
        <desc lang="de">Non-Stop Rap. Deine Feierabend-Beats.
    
    Rap nonstop, den ganzen Abend lang. Hier haben deine Lieblingsstars ein Zuhause, hier lebt der HipHop. Old School trifft New School, von Brooklyn bis Berlin, von Snoop Dogg bis RIN und über Nicki Minaj und Future wieder zurück.</desc>
      </programme>


    And shown in the code above I was also restricting translations to the first line break. And only programme decription and not titles.
    I was just proof of concepting.

    ** A person who feels appreciated will always do more than what is expected **

  • Ah OK.

    Perhaps it's not the number of requests, but the number of words in those requests.


    I just ran mine set to 0 days (all programmes in file) and it took 12 mins to translate everything.

    I don't know if the cache is doing work, or if the if statement to calculate whether to translate the programme is expensive.

  • And notes to Jensen.
    You originally were using argon translator. This is an offline translator, so the power of your device matters.
    Deep Translator that I am using is an api call. So that is based off a good internet connection. Also has better translations.
    The only concern like Chris mentions is getting blocked or hitting limits (I don't know what the limits are). So its finding the happy medium.

    Slightly adjusting my original code. This now finishes in about 21 mins for 4 urls. 1 day epg. 60 workers.
    So we have given you a few things to experiment with.


    ** A person who feels appreciated will always do more than what is expected **

  • I just ran mine without using a proxy, every day and every attribute.

    It completes in 0h 6m 58s and everything seems to be translated.

    To be clear, my code is based on an old project of mine from years ago.

    Back then, a proxy was 100% needed.

    Seems like it's not any more?

  • So we both seem to be around the 5 min mark per url.
    I assume your above message was just 1 url.
    I am just trying mine now including titles and not breaking on line breaks.

    ** edit - titles and full text is taking long time 18 mins in and no output yet ***

    When jenseneverest returns to this thread he will be like .... :nerd face: :exploding head: :scratch_one-s_head:

    ** A person who feels appreciated will always do more than what is expected **

Your resource for Enigma2 EPG tools

Get downloads and support for Cool TV Guide, CrossEPG, EPG Importer, IPTV EPG, and satellite/cable program guides. Centralized support and downloads for Enigma2 EPG plugins. Find tools for IPTV, satellite, and cable electronic program guides, including EPGImport, Web Grabber, and more.

Participate now!

Don’t have an account yet? Register yourself now and be a part of our community!