Tuesday, 24 August 2021

Good listener

Objective

To record and play audio

The product should have the following features:
1. Records 'good' quality audio
2. Record continuously - with no time limit*
3. Play available audio data - oldest first
4. Share audio data with phone / computer
5. Upgrade over wifi

*store audio data in a circular buffer (of limited memory), when full, overwrite 'older' recording

From a standalone product perspective, there should also be battery power, charging via micro usb and soft shutdown features.

From a user perspective, there should be appropriate webpage/app, audio error/instruction messages, buttons and LEDs.

Key components




Components functionality - initial understanding

  1. Mic [on + record]
    • capture human audible frequencies (20 hz - 20 khz)
    • specially 1-2 khz
  2. ADC [on + record]
    • Mic output voltage convert to digital stream
    • resolution
    • sampling frquency
  3. MP3 Encoder [on + record]
    • can be software or hardware
    • digital output of ADC to .mp3
  4. Microcontroller
    • [on + record]
      • store mp3 in memory (circular buffer)
    • [on + play / error] fetch mp3 from memory, supply to decoder
    • [on + share]
      • enter access point+server mode
      • make audio recording (mp3) available on web page (with size and timestamp)
    • [initiated on web page]
      • update wifi settings/radio url in memory
      • Perform OTA upgrade (enter station mode and connect to router, initiate https OTA API)
    • [on + radio]
      • enter station mode and connect to router
      • get radio stream and buffer (as needed)
      • play radio stream
  5. MP3 Decoder [on + play / error / radio]
    • can be software / hardware
    • decode mp3 to playable digital stream
  6. DAC [on + play / error / radio]
    • convert decoder output (digital stream) to analog voltages
    • resolution
  7. Amplifier [on + play / error / radio]
    • DAC output amplification of voltage
  8. Speaker [on + play / error / radio]
    • amplifier output to sound waves
    • respond to volume control
  9. Memory
    • store audio recording in loop
    • store error(x).mp3
    • store radio buffering (if required)
    • store wifi credentials (microcontroller and external router)
    • store webpage data (if required)
    • store audio recording size and timestamp
    • store radio URL
    • store software version
    • store any other metadata
  10. Power management
    • battery rechargeable
    • voltage regulation
    • charging mechanism - microusb connector
    • battery low / full indicator - RGB led
  11. User input
    • [record button] audible sound
    • [play button] full recording play via speaker
    • [share button] connect and view web page
    • [radio button] play radio via speaker
    • volume control, on/off button (switch debouncer)
  12. Webpage [on share]
    • show list of audio recordings with size and timestamp
    • play audio recording
    • download audio recording
    • update wifi settings (microcontroller and external router)
    • update radio url
    • upgrade - software version as read-only display
  13. Router
    1. provide connection to
      1. internet - radio server [on + radio]
      2. https server running on computer [on upgrade]
  14. Mobile [on share]
    1. connect to esp wifi
    2. open esp ip webpage
Chosen ESP8266 for microcontroller - covers # 4
Chosen VS1053 as audio recorder/player module - covers # 1-3, 5-7

Other audio recorder modules that could be explored in future:

  1. ISD1820: calcuttaelectronicscomponents101amazon
  2. ISD1760: electronicscompdatasheet
  3. ISD1806: amazon

Chosen VS1053 because the quality of documentation, support group and open-ness about internal workings, format used etc.


Mini projects

  1. Hello mic and speaker
  2. ESP8266: enter access point mode, server mode [on button press], show webpage with text 'hello world' via an IP address
  3. Save .txt in memory and display on webpage
  4. Save multiple files in memory and retrieve (hello file system)
  5. Capture user button press on web page and retrieve user submitted info
  6. Enter station mode and connect to internet router, get text from example.com
  7. OTA upgrade
  8. Digital audio play mic --> vs1053 board --> speaker (ESP in between)
  9. User input with ESP GPIOs with switch debouncer
  10. Digital audio: file encoding
  11. Internet radio: decoding and playing
  12. Battery indicator with LEDs


Introduction to esp8266

ESP8266 setup - initial study and environment setup to write and run code


FreeRTOS notes - notes from official documents


ESP8266 examplesexperience with the official examples (RTOS SDK)


Using nodemcu module which in turn has esp-12e module of esp8266.


Introduction to audio

  1. intro to sound formats
  2. mp3 and aac explained (digikey)
  3. audio file size calculator
  4. audio formats and file types
  5. vs1053 board - sparkfun link
  6. vs1053 LC technology board schematic



VS1053 notes from datasheet

  1. Suggested schematic vs actual, when design yours, understand purpose of each component & hence the chosen value
  2. GBUF usage, common voltage, to eliminate need of large isolation capacitors for earphones - not clear
  3. LINE IN and mic should not be connected at same time - how to disable one when other in use
  4. DREQ usage - start (or finish) next sci/sdi operation after checking the status of dreq - not clear why finish word is used
  5. SDI is for audio data
  6. byteRate? fast playback? playSpeed? absolute play position? positionMSec? mentioned in decode_time
  7. how much speed is increased by playspeed? how to calc if uC can send data for fast play (expected by vs)
  8. wram and wram_addr: X/Y memory, instruction word 32 bits
  9. Decode_time can be used to change bitrate
  10. do we need to fill AI_ADDRR with same info as WRAMADDR? application notes
  11. page 48 - what is the benefit of putting higher (24MHz) freq?
  12. Hw reset doubles as full power down vs low power mode (power vs time to get back)
  13. fast fwd and rewind without audio
  14. SPI speed should match the bitrate
  15. checking 2 word (32 bit) variables twice to ensure value of high bit has not changed (by the time you get to it after reading low byte) - not clear, pg 58
  16. extra parameters: latestjump not clear, jumppoints - exact implementation not clear

ESP8266 with VS1053


Sample projects


1. Sample project (Arduino code) to play mp3 (hardcoded) - baldram’s (most useful)

2. Link explaining a problem with vs1053 module board & provides fix (hardware as well as a software) - incorporated in baldram’s code

3. Arduino project to setup single radio station - based on baldram’s code

4. Project using esp32 freertos sdk api - based on baldram’s code, adds infrared etc. support

5. Code connecting nodemcu with vs1053 - clarity of spi pins on nodemcu

6. Understanding of SPI clock polarity and phase


About the first (baldram's) sample project

  1. Arduino spi.cpp - implementation of all important SPI functions
  2. Arduino esp8266_peri.h having mapping of Arduino and ESP8266 registers. Using this file along with spi_register.h (from RTOS SDK API), you can figure our ESP register names & their purpose (to some extent)
  3. This code plays a hardcoded mp3 but for learning (and to be able to same in RTOS), modified code to just read and write vs1053 registers

Arduino (modified) code flow -  just read and write registers

  1. Serial.begin (115200)
  2. VS hard reset - using gpio
  3. SPI.begin
    1. HSPI pins setup (SCK, MOSI, MISO)
    2. setFrequency 1 MHz
    3. Register settings (SPI1C=0, SPI1U=MOSI|MISO|SSE, SPI1U1=MOSI|MISO, SPI1C1=0
  4. Player.begin
    1. set cs/dcs as o/p & 1, dreq as i/p; delay 100ms
    2. wait for dreq to be 1
  5. [called 3 times] [read_all_registers] —> loop on [read_register]
    1. control mode on
      1. SPI.beginTransaction: while (SPI1CMD & SPIBUSY) {} then apply settings (2 MHz, MSB 1st, SPI mode 0))
      2. cs=0, dcs=1
    2. SPI.write(3) - read instruction
    3. SPI.write(reg_address) // SPI1W0 (buffer register) = data
    4. SPI.transfer (0xFF<<8|0xFF) // SPI1W0 = data, ORing with FF is done to clear old values
    5. wait for dreq to be 1
    6. control mode off
      1. cs=1
      2. SPI.endTransaction // empty
  6. [called 1 time] write_register
    1. control mode on
    2. SPI.write(2)
    3. SPI.write(reg_address)
    4. SPI.write(value>>8) // SPI1W0 = data, calling ‘write’ instead of ‘transfer’
    5. SPI.write(value) // called 2 times to send 16 bits, 
    6. wait for dreq to be 1
    7. control mode off

Key files referred in underlying esp rtos sdk api

  1. spi_struct.h: spi_dev_t structure represents all the esp registers
  2. spi.h: spi_config_t represents the interface
  3. spi_register.h: defines all low level bits within esp registers
  4. spi.c: creates object of spi_config_t —> spi_object_t that helps talk to the peripheral, once it has value, registers are filled using this object


Links on ESP SPI clock configuration registers

    - Metalphreak’s explanationimplementation

    - another implementation

    - links to right arduino files



ESP SPI_CLOCK register


1 0000000000000 000011 000001 000011 (0x80003043) initial default = sysclk


0 0000000001001 100111 010011 100111 (0x2674e7) pre=9, clk_div = SPI_2MHz_DIV


0 0000000000000 100111 010011 100111 (0x274e7) pre=0, clk_div = SPI_2MHz_DIV


Reading right to left (lsb->msb): first 6 digits clkL, nxt 6 clkH, nxt 6 clkN, nxt 13 bits prescaler, last bit  is 1 if = sysclk else 0

Clock = [ 80 / (clkN+1) ] / (prescaler+1)



SCI - serial command interface


Register read/write not working - observations & learnings

  1. esp spi_clock register understanding
  2. do not place printf/logd within ‘critical / high throughput sections’. In-fact be careful while putting anywhere in underlying API
  3. Keep upgrading to new sdk api (v3.3) & toolchain (v8.4.0), say once a month: espressif link. This solved one error (InStringFetchProhibited)
  4. If stuck in rtos sdk, try simple arduino code - to check your hardware connections etc.
  5. Bit-order: do not change this to 'MSB first' (let it be default 0 i.e., LSB first), just set byte-order to big-endian ('MSB first')
  6. When reading VS1053 registers, values are inconsistent at times, some garbage then ok, to mitigate, read twice and only conclude when two recently read values are same, added this check within read_register function, also used in write_register
  7. Bits handling: esp (RTOS API) expects 32 bit data structures within SPI but register addresses are 8 bits & register values are 16 bits; pay special attention to convert 32 bit data structures to 8/16 bits using bit-shift operators
  8. Error in underlying ESP RTOS SDK API, in spi.c, in spi_master_trans where it says ‘start transmission’, after ‘SPI[host]->cmd.usr = 1;’, there should be ‘while (SPI[host]->cmd.usr);’. This ensures the program waits till SPI transmission (MISO or MOSI) completes before proceeding further. In original file, this while statement is inside the next 'if statement' under ‘receive miso data’ (2 lines below); cut-paste is required. Whether it is miso/mosi, the code should wait for cmd.usr to become 0. This idea came from the Arduino code that was checking two registers 'while (SPI1CMD & SPIBUSY) {}' while writing. Issue has been raised with Espressif on Github.
  9. In Arduino code, SPI settings are applied (within SPI.beginTransaction) every time a read / write is called. In RTOS API, spi_init is done only once in the beginning (combining what is happening in SI.begin + SPI.beginTransaction). But that is fine. Arduino API caters to many uCs and peripherals so they may have reasons to implement it that way.

Final solution was a combination of upgrading the SDK API and toolchain, correcting the bit order & correcting the error in underlying SDK.


SDI - serial data interface

Data exchange not working - observations & learnings

VS can receive max 32 bytes of SDI data

ESP can send max 64 bytes

ESP SDI not working. Tried these things but did not work: 

various ‘chunk sizes’ (8/16/32/64) while sending, various clock speeds, Arduino platform, different delays, different places for dreq wait etc.


Final solutions that worked (all 5 required)

  1. Patch (right after hard reset): set GPIO 0 & 1 to GND (problem with LC technology board): Link
  2. DCS pin not working on LC technology board: use ‘shared’ mode (setting at the beginning after hard reset): Link. Introduced separate functions control_mode_on/off, data_mode_on/off
  3. Updated write_register to remove check (read what was written) for WRAM register because it cannot be read back without changing the value of WRAMADDR - check introduced in separate function wram_read
  4. Updated read_register to remove check (read 20 times till 2 reads are equal) for WRAM register because it is too volatile (value changes unpredictably after first read)
  5. Updated write_register to remove check (read what was written) for SM_RESET bit in SCI_MODE register because it gets reset as soon as actual softreset happens
Performed all tests given in datasheet including memory test, sine test, new site test and sweep test.
  • Good frequency to test: Fs=6 (11,025 Hz), S (sine skip speed)=2 (overall hex value: C2); F=Fs*S/128 = 172 Hz
  • Tested left & right channels with different frequencies, volume up and down
  • SCI test did not work - but that is ok since SCI is tested otherwise via read/write registers


ADPCM recording & playing

  • settings to start recording - straight from datasheet
  • apply adpcm patch
  • storing data
    • sample buffer 1024*2 bytes, each block 256 bytes, header 60 bytes (lsb first)
    • file size: sample rate 8KHz, bit depth 4 bits so ~32 kbps (32440 bits/s = 237.6 KB/min
    • ~4 min recording in 1 MB (space available on esp inbuilt eeprom); 1 min = 1000 blocks, 15s = 250 blocks (start here)
    • understand sample rate v audio length v storage space (record_time_size.ods)


ADPCM recording not working - observations & learnings


Links referred: Adafruit forum, Arduino forumadafruit library

VS official patches: downloads & sample code (.c)

VSDSP official forum: vs1053 playback too fast,  checklist of common ‘getting started’ problems


  • function read_register_no_chk (no double checking of value) was returning 32 bits! we were reading last 16 that were always 0 (silly internal code mistake)

  • The check "4th byte of each data block should be 0" (from datasheet) is getting lots of errors! tried clocks 2/4/5/8/16 with clockF 0x8800; settled on 10 MHz with x3.5+1 factor (default recommended in datasheet) as minimum errors but varies

  • buffer overflow: esp is slower? played with clocks but no use: played with when to start reading (hdat1>256 & <896) - did not prevent overflow but returns minimum errors

    > Above two points did not impact the final audio quality

  • read only 256 bytes at a time or 512 bytes? no difference in quality, settled for  512 as more efficient to read more (50%) of the buffer in one go

  • ‘puts’ does not work as it expects a string termination with 0x00 (or \0) - a normal occurrence in audio header/data
  • ‘putc’ works, later replaced with ‘fwrite’ for efficiency
  • ‘fread’ returns data in reverse byte order (due to little endian-ness of esp_ - need to flip each byte (link)

  • code cleanup: hard_reset function also applies basic settings, separate header.h & c

  • tried different sampling rates - observations (see 'record_time_size.ods' on github)

  • this link has a # of solutions put in place by vlsi e.g. ready-made player from sd card, can look into this before implementing more features
  • vs1053 board internal schematic link
  • Finally what helped was: hard reset (+ patches) after recording and before playing. This is a must and led to successful playback





Ogg Vorbis


Chose Ogg Vorbis as VS1053 offers only this compression format for encoding. Discovered that mp3 encoding is quite a secret and not easy to replicate. Ogg Vorbis is open source and as good as (if not better) than mp3.


Intro videos from xiph.org, pdf from vlsi along with example code


Ogg Vorbis recording not working - observations & learnings

  • problem 1: not playable file
    • spi clk was at 2MHz, clockF at 4.5x, spi clk increased to 8 MHz for player

  • problem 2: recording (& playing) but repeats, skipping, lot of noise & occasional repeats/skipping, crashes
    • esp seems faster: fread replaced with fputc to slow down
  • problem 3: recording but lot of noise & occasional repeats/skipping, crashes (unreliable)
    • studied example code & incorporated encoder loop
  • problem 4: recording fine but lot of noise
    • replaced read/write with read/write_nochk (no double checking of value if read/written), added dreq after disabling interrupts & after loading plugin
  • problem 5: recording fine (excellent once) but occasional noise and crashes
    • added esp_logi (prints) after activating encoder (printing ctrl registers) & in loop (printing buffer status and 3 other variables)
    • buffer size is <256 for ~100 times before it becomes ~2000, this repeats almost consistently till the number of blocks are read
  • problem 6: recording decent but occasional noise and crashes, skipping with 16k profile
    • powering off esp, vs & switching on (in that order) sometimes fixes the crash/noise problem
Best result so far on profile 8kq05 but with crashes and noise

Things tried further on this profile

  • fresh file with:simpler read/write registers
  • checks for spi error
  • buffer value = ffff error, ignore/re-read hdat1 if value>4095
  • studied encoder plugin file in detail (in excel)
  • plugin check (load full, read back full) - lots of errors, program restarts by watchdog
  • plugin check (write 1 wram_section, read back and re-write whole section) - lots of errors, program restarts by watchdog
  • plugin check (read each byte after write & re-write) - lots of errors, program restarts by watchdog
  • dreq wait with delay and timeout (10/50 microseconds)
  • removed decoder patch
  • set DCS=1 right at the beginning & keep throughout
  • write sdi_patch right at the beginning & keep throughout
  • set SDI_SHARE right at the beginning & keep throughout
  • delays at various places (within recording/playing loops)


Finally what worked seems to be a combination of using simpler read/write functions (without double checking), skipping garbage values in HDAT1, removing decoder patch, implementing proper dreq wait with delay & timeout, making key settings (DCS pin, SDI_patch, SDI_share) in the beginning.


Working fine with 8kq01/02/03. Reliable, no skipping/repeating, decent amplification (loud), not a lot of noise. Does not work fine on higher qualities.


SD card was not needed since 5 minutes of recording in ogg vorbis format is taking ~500 KB which can be stored on existing internal esp flash memory.




Loop recording


Links referred

  • Useful link on vs-dsp forum
    • xDCS=1
    • Also has pcb tips:
      • analog v digital GND planes/other differences
      • separate linear regulators and corresponding caps for analog & digital
  • Basic link on how ring buffer works in dashcam: record in small files, get rid of oldest to record more
  • Another link on ring buffer in C on vs-dsp forum
  • How to introduce delay in microseconds: esp32.com, tutorialspoint, geeksforgeeks
  • Link on how to upload your own files from computer to esp8266 spiffs. This was explored so HTMLs can be created, pushed on esp32 internal memory so esp32 as an HTTP server could show it when requested by a client. Did not work as such. We ended up creating HTMLs within the code. Raw and tedius but allows you to insert dynamic data well

  • How to create 2D array at run time - link from stackoverflow

  • Add yield() or delay() inside your for or while loops that could take longer than, say 100ms. This is to give time to other tasks to run. Also to avoid reset due to watchdog timer. Github link. Also has ESP GPIO and other key information


Observations from looking at audio data in bytes (printed in hex)

  • Ogg vorbis header size is 1368 bytes (consistently. no matter what the quality or duration of recording is)
  • It has 3 occurrences of ‘Oggs’ followed by 'vorbis' with some bytes in between
  • 4th ‘Oggs’ means beginning of audio data. This is then NOT followed by ‘vorbis’
  • Each ‘Oggs’ occurrence means beginning of a new section (can call it a 'page')


Approach for ring buffer implementation

  • Store header in a separate file
  • Then store 7 ogg sections/pages (~100 blocks/15 s) in a file. Cut at 8th ‘Oggs’.
  • While recording, maintain ogg_count going into a particular file, switch file after 8th ‘Oggs’

  • Encoder is reading 2 blocks (512 bytes) at a time. Technically, ‘Oggs’ can appear at 2-block boundary - one should check partial occurrence at the end of one set of 2 blocks & then see if the pattern completes at the start of next set of 2 blocks. We put & then removed this check since it was too much code for a less probable scenario. At worst, a file will have 1 more ‘page’ & will be slightly bigger - not a problem.
  • test with file ~100 blocks i.e., ~15 sec each
  • fixed filenames (eg. 01.ogg)
  • ring buffer is array of strings (filenames)
  • oldest pointer - index of ‘next file to record to’ / ‘oldest file to play’
  • Metadata about ring in spiffs file: oldest pointer, # of files, file names
  • Final: Ring of 20 files, each having max 7 ogg sections i.e., ~15 s recording, size ~30k, so total ring size ~600k. Final quality: 8kq02
  • Crash occurs if we try to increase ring size although SPIFFS partition available size is ~1300k.

Learnings while implementing above approach on ring buffer

  • if you read filesize (using stat function) before closing the file, it won’t be accurate
  • fclose() does not set the file pointer to null
  • Laptop charging causes electrical interference leading to vs1053 failures (dreq timeout, nothing works)
  • Nearby light switch on/off also leads vs1053 to fail while recording / playing
  • fread/fwrite return # of bytes written / read. Always put error handling using that
  • copy header in each file while recording - output quality bad (noise and skipping)
  • copy header offline (after encoder is done) using fread - error
  • copy header offline using fputc - extremely slow / hangs, sometimes runs fine after erase_flash
  • while changing # of files in ring buffer, if you move from larger (say 30) # to smaller (say 25), extra files (26-29) will have to be deleted manually (individually or erase_flash)
  • Player can play with one header & then all files one by one & then finally send endfillbytes only once! This is the single biggest learning that eliminates the need to copy the header altogether!
  • Recording quality also improved with this change: ~0 skips, no error even without erase_flash, good quality (less noise) - code v18
  • v19 - introduced separate play_file function: ‘almost’ same quality



Introducing buttons 

To start/stop recording, playing. Also to do next file/previous file.


Links on esp8266 gpio pins: esp8266 official forumrabbitholegithubrandomnerdtutorialsinstructablesrandomnerdtutorials2techtutorialsxelectronicwings


Only 4 GPIOs left available to use. Used GPIO 2, 3, 15 and 16.


Links on capacitive touch sensor buttons: electroncomponents, electronicscomp, instructables, Julian Lett’s review of the TP224 based 4-touch button module, Great Scott’s how to make your own capacitive touch button


Other links: Mini push button, assorted kit with some caps, self-lock with cap, smd push button, Black Push Button Cap, Black push button switch


Tried three types of buttons: membrane buttons (pair of 4), touch buttons (pair of 4), DPDT self-locking switch (like the ones used in previous projects). Experience:

  • Problems in booting when membrane/touch buttons were attached
  • the membrane and touch buttons try to pull gpio pin low if Vref=GND (or high if Vref=Vcc)
  • this problem does not exist with DPDT self-locking switch type buttons
  • esp has rules about gpio pins i.e., they should be high / low during booting
  • Not all gpio has ISR e.g. GPIO 16. solution: continuous polling using while (1)
  • depending on the states of button (initial & when button pressed), negative / positive edge triggered interrupt type is used
  • can’t write big code in ISR - just xSendQueue (or a little bit more)
  • can’t use printf in ISR, have to use queues
  • to avoid task watchdog crash, put delay in while (1) in main so the task gives space to other tasks / task scheduler. taskYield() does not prevent task watchdog


Finalised 2 DPDT self locking switches for recorder and player & 2 touch buttons to jump to next/previous file while playing.


Switch debouncing (using software): global variables updated in ISRs, referred (& also updated) in tasks


Four states for recording

  1. ISR (when button pressed down): recorder_off (0) —> start_recording (1)
  2. Recorder: start_recording(1) —> started_recording(2) {after setup which also introduces sufficient delay to avoid debounce}
  3. ISR(when button pressed up): started_recording(2) —> stop_recording(3)
  4. Recorder: stop_recording(3)—>recorder_off(0)  {after soft stop & cleanup which also introduces sufficient delay to avoid debounce}

Similar states for playing.

  • No need to stop player automatically after ‘last’ file. Play continuously until stop button is pressed manually.
  • Inter button ISR handling: Entertain ‘start_recording’ only when player is off. Similarly entertain ‘start_playing’ only when recorder is off.



Creating separate vs1053.h

Linker errors: studied usage of ‘static’ and ‘extern’

  • problem: if in vs.h & no ‘static’, linker error; if static in vs.h, ISR in main can’t change value of global variable!
  • solution: declaring with extern in vs.h, defining in main, vs.c can then use it without declaring / defining


Webserver (sharing, AP mode)


  • Reviewed esp_http_server
  • webpage: display list of files with play & download links, oldest on top, download all link
  • creating html as a string in code
  • handler receives filename as param, opens file, reads bytes and send via http_send_chunk
  • sending uri using request query param so 1 uri handler can be used for all files
  • Not using a button to start/stop sharing since gpio 16 does not have ISR, so sharing remains always ON, continuous polling will keep main alive. Can keep WIFI AP mode continuously ON instead. Esp seems to have no problem in that. No other gpio available.
  • when internet radio (i.e., STN mode) is switched on (manually), switch off AP mode (automatically), play radio, when radio is switched off (manually), switch AP mode back on (automatically)


Internet radio (STN mode)

Example project: Ka-Radio

Multiple radio stations along with details (format etc.)


Learnings while implementing

  • while receiving from socket, we specify length, it will return <= len bytes (but when error it can return ffff also! handle that error in code e.g. skip and continue)
  • header separated (based on search \r\n\r\n), length varies from ~300 to >1500, put max_header_len = 2048, within function, buffer is of 256 (>=512 gives crash - most probably memory related)
  • multiple ISR handlers on same gpio are not allowed, last one over-rides
  • Including port in radio stations: getting rid of ‘hints’ data structure & straightaway getting ip from name using dns server
  • check from vs data sheet: bit rates across formats that are supported
  • for AAC, decoder needs higher clock 4.5 at least, we are using this as default (instead of normal recommended 3.5+1)


Testing of various radio stations (radio browser)

  • 0xffff in the beginning of many stations specially the ones in aac format
  • ignoring it doesn’t help except in case of bbc radio
  • header max size increased to 2048
  • bbc working ok for 20-30 mins then silence - no error/crash
  • ogg 128 works for 5-7 mins, then crashes
  • task watchdog within micros function within us_delay within dreq_wait, solution: used taskyield instead of delay_us
  • vs decoder patch does not help

Questions

  • Why bbc stops after X mins?
  • Why so many stations not working (ffff / bad response)
  • What is ICY protocol?
  • What is diff. b/w http 1.0/1.1? Link 
  • What is chunked encoding?
  • MP3 sends some metadata after every X bytes, do we handle that?
  • Shall we handle full variety of ratio stations with different URL, port, bitrate, playlist, autoplay, header, title, format?
  • Edzelf code


Points from reading vs dsp forums

  • The main problem-to-solve is buffering, flow control, out-of-phase packet assembly and error recovery
  • link with buffer size suggestion / TCP receive window
  • Find "ICY-Metaint" equals something in the headers. You need to strip away the ICY metadata packet at those byte intervals.
  • clock speed hint: for aac only it seems
  • May avoid initial burst of data (~50kb)?. Will this solve the ffff problem in many radio stations?

  • Time when dreq goes up & time when we send data, should be analysed with a logic analyzer (later)
  • Information on vs1053 pcb layout (later). VS as a chip has more pins than the LC board exposes. Think what all you can use in PCB eg. UART pins (RX/TX) to have debug info
  • Information on sd card to vs1053 fast transfer on same SPI (later)
  • events priority


Main takeaways / to-try

  • recv_buf size should be large enough to store ~5s of audio data. 128kbps=16kB/s, 5s=80kB. ESP8266 ram is ~45kB so bitrate should be <=72kbps
  • tcp socket buffer / tcp receive window (indication to sender how much data you can handle), RAM buffer. How to allocate 45 KB of RAM to tcp socket buffer?
  • buffer underflow due to slow internet connection, may be clock can be increased further i.e., 5+0 (but this is mostly applicable to AAC only)
  • icy-metaint=1 tells after how many bytes mp3 stream sends metadata. Not sure if this is needed since not much impact on playing even if we don’t handle that recurring metadata
  • avoid initial burst of data (~50kb)?. Will this solve the ffff problem in many stations?


Decisions

  • Limited bit rate (may be <72kbps)
  • no playlist
  • format mp3 only
  • AP or STN mode at a time (not together)
  • http 1.0 (no chunked encoding needed)


I/O/Task buffers & synch

  • checking heapsize at various places. understanding individual tasks memory (heap/stack size etc.)
  • need queue (FIFO) size as much as esp permits without crashing - 16k
  • parallel tasks to load and unload audio data  - same speed or which one higher? load as much as you receive, unload 32 bytes at a time
  • initial load more - how much? 90% of queue, radio_task creates load, then waits till queue is 90% full, then creates unload
  • understanding xTicksToWait:

    • if this is infinite, task does not come out/do anything else. Remains in blocked state
    • for recorder, if queue full, display error, and do what? discard bytes in hand, check radio state & fetch new
    • for player, if queue empty, wait for how long? and then do what? 5s, check state: if still STARTED_RADIO, keep waiting, break otherwise 
  • Tests with various sizes of buffer/tcp/task, final settings:
    • tcp task stack size: 2560->3840 (x1.5)
    • main audio data buffer size: 512 —>16k
    • radio task size: 2048 —> basic (without a big variable i.e., 1536)+queue size

  • handling of internet connection not available at start / in the middle of radio
  • use of wifi_stn_event_handler to change state & xQueueSend for radio_task (like ISR): default priority of an event handler is 10 (link). keep load/unload tasks priority < 10
  • both load & unload tasks will detect the change in state, will do cleanups and delete themselves
  • radio task will detect it’s queue (button press) and change state to OFF along with cleanups - should happen after load/unload delete themselves (introduce enough delays to ensure this or priority of radio_task could be set low)

  • deleting load/unload tasks from radio_task often led to crash - this is because socket/queue gets deleted first! Options considered: suspend tasks instead of delete? don’t delete queue, just clear it up? not preferred due to their memory usage. Finally, using priorities and introducing enough delays helped.


Networks handling

  • main does basic initializations like nvs, tcp, event loop, wifi_init with default config
  • individual STN and AP start/stop functions do  remaining settings: register events, set mode/id/pwd, start, connect
  • softAP mode is ON default by main
  • when radio_task starts radio, it first switches OFF AP & switches ON STN
  • while stopping radio, it does the reverse
  • No network reconnect attempt will be made from radio/load task. If connection is lost user needs to press radio button again when connection is restored OR restart the device


Possible future improvements in user interface

  • LEDs with each major functionality e.g. radio on/radio off
  • audio error messages: e.g. internet connection lost
  • LOG file (available in AP mode via html - offline/live)


OTA


Key steps

  1. upgrade application in OTA_0
  2. Main application - good listener - in OTA_1
  3. when triggered, main app will change ota_data to OTA_0 and reset esp. 
  4. Upgrade app will get latest app from driveopedia (http server) and ota_write to OTA_1. It will update ota_data to OTA_1 and reset esp

Handling partitions

  • custom partition table - taking care of offsets & sizes (final)
  • make flash hello_ota into ota_0 - needs to be at least 500K
  • .bin is enough .ota.bin not needed
  • Noted header of a bin file
  • API esp_ota_ops.h: ota.begin, ota.write, ota.end, ota.set_boot_partition, esp.restart
  • About fatal errors
  • put checks if running partition is ota_0 & update partition is ota_1 for sure
  • the two ota partitions (ota_0 & ota_1) should be in continuous 1M partition each. Also, difference between offset of one & another must be 1M
  • Explicitly ensure that the start of next ota partition is after 1024*1024 bytes. Put explicit offset (not mandatory in paritition.csv) otherwise system will fill it up wrongly.
  • If you update partition.csv, do go to menuconfig & save so the system considers the updated csv. Also, do make erase_flash
  • two separate spiffs partitions - one for ring, other for all other metadata including headerfile

  • separate partition table for standalone application (to make flash)
  • Ring with 20 spiffs files ~400K in a 440K partition leads to stack related error. Reduced to 10 files for the time being.
  • currently: both ota - 600K, spiffs 430K, working fine but quality of recording seems to have deteriorated
  • https://www.esp8266.com/viewtopic.php?p=80475
  • Need esp8266 with 4M flash. SD card otherwise

Observations from testing

  • Actually, our esp8266 based esp12E (part of nodemcu v1.0) has 4MB flash!! Can update in menuconfig-->serial flasher config
  • partition tables adjusted for 4MB
  • 20->40 files: worked fine
  • 40->60 files: crash after 49 (stack overflow)
  • recorder task stack size increased from 2048->5120, files 60->90: crash after 75 (fputc / spiffs error), some files took long to record (that’s where the problem is)
  • fputc/spiffs error after #75 even if stack size is 8192: SPIFFS: failed to update mtime, error re-appears if you restart recorder (using button): To check
  • in the mean time, limit files to 60, task stack size 8192
  • radio did not work! no error/warning: noted from printfs that unload task could not get created, reduced recorder  task stack size from 8192->4096, working fine
  • webpage not working on iphone (may be just the play control). To check


At this point, most of the initial milestones/mini projects (except power management) are done. Next step is creating a product on PCB. 


However, would like to create something useful, not just a replication of work done on breadboard so far.


Product idea based on audio reminders - product description



This required more breadboarding for new components like AVR, temperature sensor, RTC, 7-segment display, various buttons and LEDs.




Idea was to run all non-audio peripherals like temperature sensor and 7-segment display via AVR since ESP8266 is short on GPIOs.


AVR with temperature sensor - experience connecting LM335 with ATMega16


AVR with 7-segment display - experience connecting two 7-segment displays


Date-time capture in ESP8266 - no internal RTC, date-time keeping using software


Connecting ESP and AVR - experience with SPI, custom protocol, one button sending signals to two microcontrollers, capacitive touch buttons


The project was not continued all the way to PCB due to not enough GPIOs in esp8266, no easy connection between ATMega16 and ESP8266 & that we could not wait to move to our next project that involves video!


Link to code
Link to previous project - Happy Hour

No comments:

Post a Comment